Re: Explain Plan for aggregating a single column in CSV Adapter

Julian Hyde Wed, 12 Jul 2017 10:21:30 -0700

I already wrote some ideas in
https://issues.apache.org/jira/browse/CALCITE-1876. Let's discuss
there.


On Tue, Jul 11, 2017 at 1:24 PM, Luis Fernando Kauer
<[email protected]> wrote:
> Hi,
>
> If I change CsvTranslatableTable so that it implements 
> ProjectableFilterableTable instead of TranslatableTable and implement the 
> scan method, Calcite's own rules apply and the plan gets right, scanning only 
> the used field in the aggregate function.
>
> However, now I realized that
>
> "select count(*) from EMPS" generates the plan:
> EnumerableAggregate(group=[{}], EXPR$0=[COUNT()])
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>
> "select * from EMPS" generates the plan:
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>
> Notice that the count(*) generates a plan that scans all fields, requiring to 
> convert them all without the need.
> Even when using ProjectableFilterableTable plan scans all fields, but the 
> plan for "select count(name) from EMPS" scans just one field.
> What could be the best approach to handle the count(*) without having to scan 
> all fields?
>
> Best regards,
>
> Luis Fernando
>
>
>
>
>
> Em Quinta-feira, 6 de Julho de 2017 18:05, Julian Hyde <[email protected]> 
> escreveu:
>
>
>
> Calcite should realize that Aggregate has an implied Project (because it only 
> uses a few columns) and push that projection into the CsvTableScan, but it 
> doesn’t.
>
> I think we need a new rule for Aggregate on a TableScan of a 
> ProjectableFilterableTable. Can you create a JIRA case please?
>
> I created a test case. It currently fails:
>
> diff --git a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java 
> b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
> index 00c59ee..2402872 100644
> --- a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
> +++ b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
> @@ -241,6 +241,13 @@ public Void apply(ResultSet resultSet) {
>          .ok();
>    }
>
> +  @Test public void testAggregateImpliesProject() throws SQLException {
> +    final String sql = "select max(name) from EMPS";
> +    final String plan = "PLAN=EnumerableAggregate(group=[{}], 
> EXPR$0=[MAX($0)])\n"
> +        + "  CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])\n";
> +    sql("smart", "explain plan for " + sql).returns(plan).ok();
> +  }
> +
>    @Test public void testFilterableSelect() throws SQLException {
>      sql("filterable-model", "select name from EMPS").ok();
>    }
>
>
> Julian
>
>
>> On Jul 6, 2017, at 1:23 PM, Luis Fernando Kauer 
>> <[email protected]> wrote:
>>
>> Hi,
>> I'm trying to understand the CSV Adapter and how the rules are fired.The 
>> CsvProjectTableScanRule gets fired when I use CsvTranslatableTable.But I'm 
>> not understanding why I'm getting a plan that scans all fields when I use an 
>> aggregate function.For example:explain plan for select name from 
>> emps;CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
>>
>> explain plan for select max(name) from emps;EnumerableAggregate(group=[{}], 
>> EXPR$0=[MAX($1)])CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 
>> 5, 6, 7, 8, 9]])
>> I noticed that the rule gets fired and at that point it shows just 1 field 
>> being used.But the last time CsvTableScan.deriveRowType() gets called it has 
>> all the fields set, and it's not the instance create by the rule, but the 
>> first instance created with all the fields.
>> Can anybody explain me if this is a bug or if this is supposed to happen 
>> with aggregate functions ?
>> Best regards,
>> Luis Fernando Kauer

Re: Explain Plan for aggregating a single column in CSV Adapter

Reply via email to