[
https://issues.apache.org/jira/browse/DRILL-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267233#comment-14267233
]
Jason Altekruse commented on DRILL-1394:
----------------------------------------
The only way I would think this could happen is if a schema change cut off the
aggregation 'group' which was supposed to group together all of the records
into two groups. The behavior of union all currently is to just pass along the
incoming data streams unmodified, there is no enforcement of type similarity.
An actual sql-compliant union currently only occurs if the names on either side
of the union all match, in this case they do. This bug is going to be fixed
soon, but even with the current version of the operator this query should work.
I don't see why there would be a schema change, but is there any way the two
columns are of a different type in the two files?
> COUNT(*) with UNION subquery returns two rows
> ---------------------------------------------
>
> Key: DRILL-1394
> URL: https://issues.apache.org/jira/browse/DRILL-1394
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.5.0
> Reporter: Bob Rumsby
> Assignee: Sean Hsuan-Yi Chu
> Priority: Critical
> Fix For: 0.8.0
>
>
> The following COUNT(*) query with a UNION subquery returns two rows, one
> count for each side of the union. Run by itself, the subquery returns 70000
> rows.
> 0: jdbc:drill:> select count(*) from (select trans_id from
> `clicks/clicks.campaign.json` union all select trans_id from
> `clicks/clicks.json`);
> +------------+
> | EXPR$0 |
> +------------+
> | 40000 |
> | 30000 |
> +------------+
> 2 rows selected (5.896 seconds)
> 0: jdbc:drill:> explain plan for select count(*) from (select trans_id from
> `clicks/clicks.campaign.json` union all select trans_id from
> `clicks/clicks.json`);
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-02 Project($f0=[0])
> 00-03 UnionAll(all=[true])
> 00-05 Project(trans_id=[$1])
> 00-07 Scan(groupscan=[EasyGroupScan
> [selectionRoot=/mapr/demorig/data/nested/clicks/clicks.campaign.json, columns
> = null]])
> 00-04 Project(trans_id=[$1])
> 00-06 Scan(groupscan=[EasyGroupScan
> [selectionRoot=/mapr/demorig/data/nested/clicks/clicks.json, columns = null]])
> | {
> "head" : {
> "version" : 1,
> "generator" : {
> "type" : "ExplainHandler",
> "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
> },
> "graph" : [ {
> "pop" : "fs-scan",
> "@id" : 7,
> "files" : [
> "maprfs:/mapr/demorig/data/nested/clicks/clicks.campaign.json" ],
> "storage" : {
> "type" : "file",
> "enabled" : true,
> "connection" : "maprfs:///",
> "workspaces" : {
> "root" : {
> "location" : "/mapr/demorig/data",
> "writable" : false,
> "storageformat" : null
> },
> "nested" : {
> "location" : "/mapr/demorig/data/nested",
> "writable" : true,
> "storageformat" : "parquet"
> },
> "flat" : {
> "location" : "/mapr/demorig/data/flat",
> "writable" : true,
> "storageformat" : "parquet"
> },
> "views" : {
> "location" : "/mapr/demorig/data/views",
> "writable" : true,
> "storageformat" : "parquet"
> },
> "yelp" : {
> "location" : "/mapr/demorig/data/yelp",
> "writable" : true,
> "storageformat" : "json"
> }
> },
> "formats" : {
> "psv" : {
> "type" : "text",
> "extensions" : [ "tbl" ],
> "delimiter" : "|"
> },
> "csv" : {
> "type" : "text",
> "extensions" : [ "csv" ],
> "delimiter" : ","
> },
> "tsv" : {
> "type" : "text",
> "extensions" : [ "tsv" ],
> "delimiter" : "\t"
> },
> "parquet" : {
> "type" : "parquet"
> },
> "json" : {
> "type" : "json"
> }
> }
> },
> "format" : {
> "type" : "json"
> },
> "selectionRoot" : "/mapr/demorig/data/nested/clicks/clicks.campaign.json",
> "cost" : 7876.0
> }, {
> "pop" : "project",
> "@id" : 5,
> "exprs" : [ {
> "ref" : "`trans_id`",
> "expr" : "`trans_id`"
> } ],
> "child" : 7,
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : 7876.0
> }, {
> "pop" : "fs-scan",
> "@id" : 6,
> "files" : [ "maprfs:/mapr/demorig/data/nested/clicks/clicks.json" ],
> "storage" : {
> "type" : "file",
> "enabled" : true,
> "connection" : "maprfs:///",
> "workspaces" : {
> "root" : {
> "location" : "/mapr/demorig/data",
> "writable" : false,
> "storageformat" : null
> },
> "nested" : {
> "location" : "/mapr/demorig/data/nested",
> "writable" : true,
> "storageformat" : "parquet"
> },
> "flat" : {
> "location" : "/mapr/demorig/data/flat",
> "writable" : true,
> "storageformat" : "parquet"
> },
> "views" : {
> "location" : "/mapr/demorig/data/views",
> "writable" : true,
> "storageformat" : "parquet"
> },
> "yelp" : {
> "location" : "/mapr/demorig/data/yelp",
> "writable" : true,
> "storageformat" : "json"
> }
> },
> "formats" : {
> "psv" : {
> "type" : "text",
> "extensions" : [ "tbl" ],
> "delimiter" : "|"
> },
> "csv" : {
> "type" : "text",
> "extensions" : [ "csv" ],
> "delimiter" : ","
> },
> "tsv" : {
> "type" : "text",
> "extensions" : [ "tsv" ],
> "delimiter" : "\t"
> },
> "parquet" : {
> "type" : "parquet"
> },
> "json" : {
> "type" : "json"
> }
> }
> },
> "format" : {
> "type" : "json"
> },
> "selectionRoot" : "/mapr/demorig/data/nested/clicks/clicks.json",
> "cost" : 5097.0
> }, {
> "pop" : "project",
> "@id" : 4,
> "exprs" : [ {
> "ref" : "`trans_id`",
> "expr" : "`trans_id`"
> } ],
> "child" : 6,
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : 5097.0
> }, {
> "pop" : "union-all",
> "@id" : 3,
> "children" : [ 5, 4 ],
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : 12973.0
> }, {
> "pop" : "project",
> "@id" : 2,
> "exprs" : [ {
> "ref" : "`$f0`",
> "expr" : "0"
> } ],
> "child" : 3,
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : 12973.0
> }, {
> "pop" : "streaming-aggregate",
> "@id" : 1,
> "child" : 2,
> "keys" : [ ],
> "exprs" : [ {
> "ref" : "`EXPR$0`",
> "expr" : "count(1) "
> } ],
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : 1.0
> }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : 1297.3
> } ]
> } |
> +------------+------------+
> 1 row selected (0.142 seconds)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)