[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails

Kirill Kozlov (Jira) Wed, 13 Nov 2019 12:31:00 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973664#comment-16973664
 ]


Kirill Kozlov commented on BEAM-8042:
-------------------------------------

[~kanterov] After looking more into this, I suspect that it is a 
ResolvedAggregateScan bug. Aggregate list, which stores ResolvedComputedColumns 
stores wrong column indexes in argumentList. 

Following test cases pass:
{code:java}
String sql = "SELECT \n" +
    "  id, \n" +
    "  COUNT(id) as count, \n" +
    "  SUM(has_f1) as f1_count, \n" +
    "  SUM(has_f2) as f2_count, \n" +
    "  SUM(has_f3) as f3_count, \n" +
    "  SUM(has_f4) as f4_count, \n" +
    "  SUM(has_f5) as f5_count, \n" +
    "  SUM(has_f6) as f6_count  \n" +
    "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
    "GROUP BY id";
{code}
{code:java}
String sql = "SELECT \n" +
    "  id, \n" +
    "  SUM(has_f1) as f1_count, \n" +
    "  SUM(has_f2) as f2_count, \n" +
    "  SUM(has_f3) as f3_count, \n" +
    "  SUM(has_f4) as f4_count, \n" +
    "  SUM(has_f5) as f5_count, \n" +
    "  SUM(has_f6) as f6_count, \n" +
    "  COUNT(*) as count \n" +
    "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
    "GROUP BY id";
{code}
 

In case when COUNT is last the combined aggregates look like this (as they 
should, used field indexes are correct):
{code:java}
aggCalls = {ArrayList@5898}  size = 7
 0 = {AggregateCall@5899} "SUM($1)"
 1 = {AggregateCall@5947} "SUM($2)"
 2 = {AggregateCall@5948} "SUM($3)"
 3 = {AggregateCall@5949} "SUM($4)"
 4 = {AggregateCall@5950} "SUM($5)"
 5 = {AggregateCall@5951} "SUM($6)"
 6 = {AggregateCall@5952} "COUNT()"
{code}
When COUNT is first (used field indexes are offset by 1):
{code:java}
aggCalls = {ArrayList@5898}  size = 7
 0 = {AggregateCall@5899} "COUNT()"
 1 = {AggregateCall@5947} "SUM($2)"
 2 = {AggregateCall@5948} "SUM($3)"
 3 = {AggregateCall@5949} "SUM($4)"
 4 = {AggregateCall@5950} "SUM($5)"
 5 = {AggregateCall@5951} "SUM($6)"
 6 = {AggregateCall@5952} "SUM($7)"
{code}
It increases fields used by aggregates that happen after COUNT by 1, thus 
causing an index out of bound exception when validating the last one. Even if 
out of bound exception was not thrown, this would produce the wrong result, 
because SUM aggregates would happen over the wrong fields.

> Parsing of aggregate query fails
> --------------------------------
>
>                 Key: BEAM-8042
>                 URL: https://issues.apache.org/jira/browse/BEAM-8042
>             Project: Beam
>          Issue Type: Sub-task
>          Components: dsl-sql-zetasql
>            Reporter: Rui Wang
>            Priority: Critical
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
>     BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
>     opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
>     return opts;
>   }
>   @Test
>   public void testAggregate() {
>     Schema inputSchema = Schema.builder()
>         .addByteArrayField("id")
>         .addInt64Field("has_f1")
>         .addInt64Field("has_f2")
>         .addInt64Field("has_f3")
>         .addInt64Field("has_f4")
>         .addInt64Field("has_f5")
>         .addInt64Field("has_f6")
>         .build();
>     String sql = "SELECT \n" +
>         "  id, \n" +
>         "  COUNT(*) as count, \n" +
>         "  SUM(has_f1) as f1_count, \n" +
>         "  SUM(has_f2) as f2_count, \n" +
>         "  SUM(has_f3) as f3_count, \n" +
>         "  SUM(has_f4) as f4_count, \n" +
>         "  SUM(has_f5) as f5_count, \n" +
>         "  SUM(has_f6) as f6_count  \n" +
>         "FROM PCOLLECTION \n" +
>         "GROUP BY id";
>     pipeline
>         .apply(Create.empty(inputSchema))
>         .apply(SqlTransform.query(sql));
>     pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>       at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>       at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>       at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>       at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>       at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>       at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>       at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>       at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>       at 
>       ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>       at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>       at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>       at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>       at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>       ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails

Reply via email to