[
https://issues.apache.org/jira/browse/DRILL-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511502#comment-14511502
]
Sean Hsuan-Yi Chu commented on DRILL-2376:
------------------------------------------
In fact the issue resides in StreamAgg, which gave Union-All the wrong
information regarding schema change.
You can reproduce this issue with this physical plan:
(This plan is equivalent to "select sss from (select sum(1) as sss from
cp.`tpch/nation.parquet`) group by sss";
But if SQL is typed in, calcite would not choose this plan)
{
"head" : {
"version" : 1,
"generator" : {
"type" : "ExplainHandler",
"info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
"name" : "planner.enable_streamagg",
"kind" : "BOOLEAN",
"type" : "SESSION",
"bool_val" : true
} ],
"queue" : 0,
"resultMode" : "EXEC"
},
"graph" : [ {
"pop" : "parquet-scan",
"@id" : 4,
"userName" : "hyichu",
"entries" : [ {
"path" : "/tpch/nation.parquet"
} ],
"storage" : {
"type" : "file",
"enabled" : true,
"connection" : "classpath:///",
"workspaces" : null,
"formats" : {
"csv" : {
"type" : "text",
"extensions" : [ "csv" ],
"delimiter" : ","
},
"tsv" : {
"type" : "text",
"extensions" : [ "tsv" ],
"delimiter" : "\t"
},
"json" : {
"type" : "json"
},
"parquet" : {
"type" : "parquet"
},
"avro" : {
"type" : "avro"
}
}
},
"format" : {
"type" : "parquet"
},
"columns" : [ "`*`" ],
"selectionRoot" : "/tpch/nation.parquet",
"cost" : 25.0
}, {
"pop" : "project",
"@id" : 3,
"exprs" : [ {
"ref" : "`$f0`",
"expr" : "1"
} ],
"child" : 4,
"initialAllocation" : 1000000,
"maxAllocation" : 10000000000,
"cost" : 25.0
}, {
"pop" : "streaming-aggregate",
"@id" : 2,
"child" : 3,
"keys" : [ ],
"exprs" : [ {
"ref" : "`sss`",
"expr" : "sum(`$f0`) "
} ],
"initialAllocation" : 1000000,
"maxAllocation" : 10000000000,
"cost" : 1.0
}, {
"pop" : "hash-aggregate",
"@id" : 1,
"child" : 2,
"cardinality" : 1.0,
"initialAllocation" : 1000000,
"maxAllocation" : 10000000000,
"groupByExprs" : [ {
"ref" : "`sss`",
"expr" : "`sss`"
} ],
"aggrExprs" : [ ],
"cost" : 12.5
}, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 1000000,
"maxAllocation" : 10000000000,
"cost" : 1.0
} ]
}
> UNION ALL on Aggregates with GROUP BY returns incomplete results
> ----------------------------------------------------------------
>
> Key: DRILL-2376
> URL: https://issues.apache.org/jira/browse/DRILL-2376
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.9.0
> Reporter: Abhishek Girish
> Assignee: Sean Hsuan-Yi Chu
> Fix For: 0.8.0
>
> Attachments: t1.parquet, t2.parquet
>
>
> The following query returns incomplete results:
> {code:sql}
> select x
> from
> (SELECT Sum(ss_ext_sales_price) x
> FROM store_sales
> UNION ALL
> SELECT Sum(cs_ext_sales_price) x
> FROM catalog_sales) tmp
> GROUP BY x;
> Results from Drill:
> +------------+
> | x |
> +------------+
> | 3658019159.35 |
> +------------+
> 1 row selected (3.474 seconds)
> Results from Postgres:
> x
> ---------------
> 5265207074.51
> 3658019159.35
> (2 rows)
> {code}
> Removing GROUP BY returns the right results:
> {code:sql}
> select x
> from
> (SELECT Sum(ss_ext_sales_price) x
> FROM store_sales
> UNION ALL
> SELECT Sum(cs_ext_sales_price) x
> FROM catalog_sales) tmp;
> Results from Drill:
> +------------+
> | x |
> +------------+
> | 5265207074.51 |
> | 3658019159.35 |
> +------------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)