Re: Explain Plan for aggregating a single column in CSV Adapter

Luis Fernando Kauer Tue, 11 Jul 2017 14:04:57 -0700

Hi,

If I change CsvTranslatableTable so that it implements 
ProjectableFilterableTable instead of TranslatableTable and implement the scan 
method, Calcite's own rules apply and the plan gets right, scanning only the 
used field in the aggregate function.


However, now I realized that 

"select count(*) from EMPS" generates the plan:
EnumerableAggregate(group=[{}], EXPR$0=[COUNT()]) 
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

"select * from EMPS" generates the plan:
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

Notice that the count(*) generates a plan that scans all fields, requiring to 
convert them all without the need.
Even when using ProjectableFilterableTable plan scans all fields, but the plan 
for "select count(name) from EMPS" scans just one field.
What could be the best approach to handle the count(*) without having to scan 
all fields?

Best regards,

Luis Fernando





Em Quinta-feira, 6 de Julho de 2017 18:05, Julian Hyde <[email protected]> 
escreveu:



Calcite should realize that Aggregate has an implied Project (because it only 
uses a few columns) and push that projection into the CsvTableScan, but it 
doesn’t.

I think we need a new rule for Aggregate on a TableScan of a 
ProjectableFilterableTable. Can you create a JIRA case please?

I created a test case. It currently fails:

diff --git a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java 
b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
index 00c59ee..2402872 100644
--- a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
+++ b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
@@ -241,6 +241,13 @@ public Void apply(ResultSet resultSet) {
         .ok();
   }

+  @Test public void testAggregateImpliesProject() throws SQLException {
+    final String sql = "select max(name) from EMPS";
+    final String plan = "PLAN=EnumerableAggregate(group=[{}], 
EXPR$0=[MAX($0)])\n"
+        + "  CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])\n";
+    sql("smart", "explain plan for " + sql).returns(plan).ok();
+  }
+
   @Test public void testFilterableSelect() throws SQLException {
     sql("filterable-model", "select name from EMPS").ok();
   }


Julian


> On Jul 6, 2017, at 1:23 PM, Luis Fernando Kauer 
> <[email protected]> wrote:
> 
> Hi,
> I'm trying to understand the CSV Adapter and how the rules are fired.The 
> CsvProjectTableScanRule gets fired when I use CsvTranslatableTable.But I'm 
> not understanding why I'm getting a plan that scans all fields when I use an 
> aggregate function.For example:explain plan for select name from 
> emps;CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
> 
> explain plan for select max(name) from emps;EnumerableAggregate(group=[{}], 
> EXPR$0=[MAX($1)])CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 
> 5, 6, 7, 8, 9]])
> I noticed that the rule gets fired and at that point it shows just 1 field 
> being used.But the last time CsvTableScan.deriveRowType() gets called it has 
> all the fields set, and it's not the instance create by the rule, but the 
> first instance created with all the fields.
> Can anybody explain me if this is a bug or if this is supposed to happen with 
> aggregate functions ?
> Best regards,
> Luis Fernando Kauer

Re: Explain Plan for aggregating a single column in CSV Adapter

Reply via email to