[
https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180563#comment-15180563
]
ASF GitHub Bot commented on DRILL-4474:
---------------------------------------
GitHub user jacques-n opened a pull request:
https://github.com/apache/drill/pull/406
DRILL-4474: Ensure that ConvertCountToDirectScan only pushes through
project when project is trivial.
There are probably additional optimizations here but as the code stood, we
completely ignored whatever was in the project above the scan below the
count(x).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jacques-n/drill DRILL-4474
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/406.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #406
----
commit 5ab13c68357008c07a86386aa14f6f92d8584e9b
Author: Jacques Nadeau <[email protected]>
Date: 2016-03-04T21:27:26Z
DRILL-4474: Ensure that ConvertCountToDirectScan only pushes through
project when project is trivial.
----
> Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
> ----------------------------------------------------------------------
>
> Key: DRILL-4474
> URL: https://issues.apache.org/jira/browse/DRILL-4474
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.2.0, 1.5.0
> Environment: m3.xlarge AWS instances ( 3 nodes)
> CentOS6.5 x64
> Reporter: Shankar
> Priority: Blocker
>
> {quote}
> * We are using drill to retrieve the business data from game analytic.
> * We are running below queries on table of size 50GB (parquet)
> * We have found some major inconsistency in data when we use COUNT function.
> * Below is the case by case queries and their output. {color:blue}*Please
> analyse it carefully, to for clear understanding of behaviour. *{color}
> * Please let me know how to resolve this ? (or any earlier JIRA has been
> already created).
> * Hope this may be fixed in later versions. If not please do the needful.
> {quote}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-1 (Wrong result)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ;
> +-----------+
> | count |
> +-----------+
> | 27645752 |
> +-----------+
> 1 row selected (0.281 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-2 (Wrong result)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select
> . . . . . . . > count(sessionid),
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ;
> +-----------+-------+
> | EXPR$0 | cnt |
> +-----------+-------+
> | 37772844 | 2108 |
> +-----------+-------+
> 1 row selected (12.597 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-3 (Wrong result, only first count is correct)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:red}
> {quote}
> {noformat}
> 0: jdbc:drill:> select
> . . . . . . . > count(distinct sessionid),
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ;
> +---------+-----------+
> | EXPR$0 | cnt |
> +---------+-----------+
> | 201941 | 37772844 |
> +---------+-----------+
> 1 row selected (8.259 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-4 (Correct result)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select
> . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and
> t.event = 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ;
> +------+
> | cnt |
> +------+
> | 525 |
> +------+
> 1 row selected (14.318 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-5 (Correct result)
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {color:green}
> {quote}
> {noformat}
> 0: jdbc:drill:> select
> . . . . . . . > count(sessionid),
> . . . . . . . > count(distinct sessionid)
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > where ( t.id = '/confirmDrop/btnYes/' and t.event = 'Click')
> . . . . . . . > ;
> +---------+---------+
> | EXPR$0 | EXPR$1 |
> +---------+---------+
> | 2108 | 525 |
> +---------+---------+
> 1 row selected (19.355 seconds)
> {noformat}
> {quote}
> {color}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-6
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {quote}
> {noformat}
> 0: jdbc:drill:> explain plan for
> . . . . . . . >
> . . . . . . . > select
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ;
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@73ff10e1])
> | {
> "head" : {
> "version" : 1,
> "generator" : {
> "type" : "ExplainHandler",
> "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
> },
> "graph" : [ {
> "pop" : "DirectGroupScan",
> "@id" : 1,
> "cost" : 20.0
> }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 1000000,
> "maxAllocation" : 10000000000,
> "cost" : 20.0
> } ]
> } |
> +------+------+
> 1 row selected (0.276 seconds)
> {noformat}
> {quote}
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> CASE-7
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> {quote}
> {noformat}
> 0: jdbc:drill:>
> 0: jdbc:drill:> explain plan for
> . . . . . . . >
> . . . . . . . > select
> . . . . . . . > count(distinct sessionid),
> . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
> 'Click' then sessionid end) as cnt
> . . . . . . . > from dfs.tmp.a_games_log_visit_base t
> . . . . . . . > ;
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01 Project(EXPR$0=[$0], cnt=[$1])
> 00-02 Project(EXPR$0=[$1], cnt=[$0])
> 00-03 NestedLoopJoin(condition=[true], joinType=[inner])
> 00-05
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@59b7cda9])
> 00-04 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-06 UnionExchange
> 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT($0)])
> 01-02 HashAgg(group=[{0}])
> 01-03 Project(sessionid=[$0])
> 01-04 HashToRandomExchange(dist0=[[$0]])
> 02-01 UnorderedMuxExchange
> 03-01 Project(sessionid=[$0],
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02 HashAgg(group=[{0}])
> 03-03 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=hdfs://namenode:9000/tmp/a_games_log_visit_base]],
> selectionRoot=hdfs://namenode:9000/tmp/a_games_log_visit_base, numFiles=1,
> usedMetadataFile=false, columns=[`sessionid`]]])
> | {
> "head" : {
> "version" : 1,
> "generator" : {
> "type" : "ExplainHandler",
> "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
> },
> "graph" : [ {
> "pop" : "DirectGroupScan",
> "@id" : 5,
> "cost" : 20.0
> }, {
> "pop" : "parquet-scan",
> "@id" : 196611,
> "userName" : "hadoop",
> "entries" : [ {
> "path" : "hdfs://namenode:9000/tmp/a_games_log_visit_base"
> } ],
> "storage" : {
> "type" : "file",
> "enabled" : true,
> "connection" : "hdfs://namenode:9000",
> "workspaces" : {
> "root" : {
> "location" : "/tmp/",
> "writable" : true,
> "defaultInputFormat" : null
> },
> "tmp" : {
> "location" : "/tmp",
> "writable" : true,
> "defaultInputFormat" : null
> }
> },
> "formats" : {
> "psv" : {
> "type" : "text",
> "extensions" : [ "tbl" ],
> "delimiter" : "|"
> },
> "csv" : {
> "type" : "text",
> "extensions" : [ "csv" ],
> "delimiter" : ","
> },
> "tsv" : {
> "type" : "text",
> "extensions" : [ "tsv" ],
> "delimiter" : "\t"
> },
> "parquet" : {
> "type" : "parquet"
> },
> "json" : {
> "type" : "json"
> },
> "avro" : {
> "type" : "avro"
> }
> }
> },
> "format" : {
> "type" : "parquet"
> },
> "columns" : [ "`sessionid`" ],
> "selectionRoot" : "hdfs://namenode:9000/tmp/a_games_log_visit_base",
> "fileSet" : [ "/tmp/a_games_log_visit_base/1_6_15.parquet",
> "/tmp/a_games_log_visit_base/1_1_9.parquet",
> "/tmp/a_games_log_visit_base/1_7_2.parquet",
> "/tmp/a_games_log_visit_base/1_5_7.parquet",
> "/tmp/a_games_log_visit_base/1_2_1.parquet",
> "/tmp/a_games_log_visit_base/1_4_23.parquet",
> "/tmp/a_games_log_visit_base/1_8_22.parquet",
> "/tmp/a_games_log_visit_base/1_3_11.parquet",
> "/tmp/a_games_log_visit_base/1_6_30.parquet",
> "/tmp/a_games_log_visit_base/1_7_10.parquet",
> "/tmp/a_games_log_visit_base/1_1_10.parquet",
> "/tmp/a_games_log_visit_base/1_0_24.parquet",
> "/tmp/a_games_log_visit_base/1_8_39.parquet",
> "/tmp/a_games_log_visit_base/1_2_16.parquet",
> "/tmp/a_games_log_visit_base/1_7_27.parquet",
> "/tmp/a_games_log_visit_base/1_3_4.parquet",
> "/tmp/a_games_log_visit_base/1_8_5.parquet",
> "/tmp/a_games_log_visit_base/1_8_13.parquet",
> "/tmp/a_games_log_visit_base/1_4_7.parquet",
> "/tmp/a_games_log_visit_base/1_5_12.parquet",
> "/tmp/a_games_log_visit_base/1_0_9.parquet",
> "/tmp/a_games_log_visit_base/1_4_14.parquet",
> "/tmp/a_games_log_visit_base/1_2_13.parquet",
> "/tmp/a_games_log_visit_base/1_0_15.parquet",
> "/tmp/a_games_log_visit_base/1_2_4.parquet",
> "/tmp/a_games_log_visit_base/1_6_24.parquet",
> "/tmp/a_games_log_visit_base/1_7_5.parquet",
> "/tmp/a_games_log_visit_base/1_6_2.parquet",
> "/tmp/a_games_log_visit_base/1_1_13.parquet",
> "/tmp/a_games_log_visit_base/1_1_1.parquet",
> "/tmp/a_games_log_visit_base/1_3_7.parquet",
> "/tmp/a_games_log_visit_base/1_0_12.parquet",
> "/tmp/a_games_log_visit_base/1_0_3.parquet",
> "/tmp/a_games_log_visit_base/1_0_29.parquet",
> "/tmp/a_games_log_visit_base/1_1_24.parquet",
> "/tmp/a_games_log_visit_base/1_6_18.parquet",
> "/tmp/a_games_log_visit_base/1_2_19.parquet",
> "/tmp/a_games_log_visit_base/1_3_16.parquet",
> "/tmp/a_games_log_visit_base/1_6_27.parquet",
> "/tmp/a_games_log_visit_base/1_4_11.parquet",
> "/tmp/a_games_log_visit_base/1_7_15.parquet",
> "/tmp/a_games_log_visit_base/1_5_2.parquet",
> "/tmp/a_games_log_visit_base/1_8_10.parquet",
> "/tmp/a_games_log_visit_base/1_6_7.parquet",
> "/tmp/a_games_log_visit_base/1_8_2.parquet",
> "/tmp/a_games_log_visit_base/1_2_9.parquet",
> "/tmp/a_games_log_visit_base/1_7_18.parquet",
> "/tmp/a_games_log_visit_base/1_3_20.parquet",
> "/tmp/a_games_log_visit_base/1_7_31.parquet",
> "/tmp/a_games_log_visit_base/1_1_16.parquet",
> "/tmp/a_games_log_visit_base/1_6_35.parquet",
> "/tmp/a_games_log_visit_base/1_1_4.parquet",
> "/tmp/a_games_log_visit_base/1_4_19.parquet",
> "/tmp/a_games_log_visit_base/1_0_6.parquet",
> "/tmp/a_games_log_visit_base/1_8_18.parquet",
> "/tmp/a_games_log_visit_base/1_7_8.parquet",
> "/tmp/a_games_log_visit_base/1_5_15.parquet",
> "/tmp/a_games_log_visit_base/1_4_20.parquet",
> "/tmp/a_games_log_visit_base/1_0_21.parquet",
> "/tmp/a_games_log_visit_base/1_8_31.parquet",
> "/tmp/a_games_log_visit_base/1_4_4.parquet",
> "/tmp/a_games_log_visit_base/1_3_19.parquet",
> "/tmp/a_games_log_visit_base/1_1_21.parquet",
> "/tmp/a_games_log_visit_base/1_2_18.parquet",
> "/tmp/a_games_log_visit_base/1_8_20.parquet",
> "/tmp/a_games_log_visit_base/1_8_37.parquet",
> "/tmp/a_games_log_visit_base/1_0_13.parquet",
> "/tmp/a_games_log_visit_base/1_0_4.parquet",
> "/tmp/a_games_log_visit_base/1_6_13.parquet",
> "/tmp/a_games_log_visit_base/1_0_22.parquet",
> "/tmp/a_games_log_visit_base/1_1_19.parquet",
> "/tmp/a_games_log_visit_base/1_3_2.parquet",
> "/tmp/a_games_log_visit_base/1_7_29.parquet",
> "/tmp/a_games_log_visit_base/1_5_1.parquet",
> "/tmp/a_games_log_visit_base/1_8_11.parquet",
> "/tmp/a_games_log_visit_base/1_4_25.parquet",
> "/tmp/a_games_log_visit_base/1_6_32.parquet",
> "/tmp/a_games_log_visit_base/1_7_12.parquet",
> "/tmp/a_games_log_visit_base/1_7_0.parquet",
> "/tmp/a_games_log_visit_base/1_6_4.parquet",
> "/tmp/a_games_log_visit_base/1_6_22.parquet",
> "/tmp/a_games_log_visit_base/1_4_5.parquet",
> "/tmp/a_games_log_visit_base/1_8_3.parquet",
> "/tmp/a_games_log_visit_base/1_1_11.parquet",
> "/tmp/a_games_log_visit_base/1_1_3.parquet",
> "/tmp/a_games_log_visit_base/1_8_34.parquet",
> "/tmp/a_games_log_visit_base/1_2_6.parquet",
> "/tmp/a_games_log_visit_base/1_5_14.parquet",
> "/tmp/a_games_log_visit_base/1_0_7.parquet",
> "/tmp/a_games_log_visit_base/1_4_16.parquet",
> "/tmp/a_games_log_visit_base/1_6_10.parquet",
> "/tmp/a_games_log_visit_base/1_7_3.parquet",
> "/tmp/a_games_log_visit_base/1_0_27.parquet",
> "/tmp/a_games_log_visit_base/1_0_1.parquet",
> "/tmp/a_games_log_visit_base/1_0_10.parquet",
> "/tmp/a_games_log_visit_base/1_5_4.parquet",
> "/tmp/a_games_log_visit_base/1_8_19.parquet",
> "/tmp/a_games_log_visit_base/1_4_22.parquet",
> "/tmp/a_games_log_visit_base/1_7_24.parquet",
> "/tmp/a_games_log_visit_base/1_8_8.parquet",
> "/tmp/a_games_log_visit_base/1_8_25.parquet",
> "/tmp/a_games_log_visit_base/1_6_9.parquet",
> "/tmp/a_games_log_visit_base/1_5_17.parquet",
> "/tmp/a_games_log_visit_base/1_3_5.parquet",
> "/tmp/a_games_log_visit_base/1_7_30.parquet",
> "/tmp/a_games_log_visit_base/1_3_14.parquet",
> "/tmp/a_games_log_visit_base/1_1_6.parquet",
> "/tmp/a_games_log_visit_base/1_6_16.parquet",
> "/tmp/a_games_log_visit_base/1_2_0.parquet",
> "/tmp/a_games_log_visit_base/1_5_21.parquet",
> "/tmp/a_games_log_visit_base/1_1_14.parquet",
> "/tmp/a_games_log_visit_base/1_7_33.parquet",
> "/tmp/a_games_log_visit_base/1_8_28.parquet",
> "/tmp/a_games_log_visit_base/1_0_18.parquet",
> "/tmp/a_games_log_visit_base/1_4_13.parquet",
> "/tmp/a_games_log_visit_base/1_7_21.parquet",
> "/tmp/a_games_log_visit_base/1_2_3.parquet",
> "/tmp/a_games_log_visit_base/1_3_17.parquet",
> "/tmp/a_games_log_visit_base/1_4_2.parquet",
> "/tmp/a_games_log_visit_base/1_8_16.parquet",
> "/tmp/a_games_log_visit_base/1_6_25.parquet",
> "/tmp/a_games_log_visit_base/1_8_0.parquet",
> "/tmp/a_games_log_visit_base/1_2_10.parquet",
> "/tmp/a_games_log_visit_base/1_6_1.parquet",
> "/tmp/a_games_log_visit_base/1_7_6.parquet",
> "/tmp/a_games_log_visit_base/1_3_8.parquet",
> "/tmp/a_games_log_visit_base/1_8_9.parquet",
> "/tmp/a_games_log_visit_base/1_3_15.parquet",
> "/tmp/a_games_log_visit_base/1_7_23.parquet",
> "/tmp/a_games_log_visit_base/1_6_28.parquet",
> "/tmp/a_games_log_visit_base/1_4_0.parquet",
> "/tmp/a_games_log_visit_base/1_4_27.parquet",
> "/tmp/a_games_log_visit_base/1_8_35.parquet",
> "/tmp/a_games_log_visit_base/1_6_19.parquet",
> "/tmp/a_games_log_visit_base/1_0_28.parquet",
> "/tmp/a_games_log_visit_base/1_5_16.parquet",
> "/tmp/a_games_log_visit_base/1_7_14.parquet",
> "/tmp/a_games_log_visit_base/1_5_22.parquet",
> "/tmp/a_games_log_visit_base/1_8_26.parquet",
> "/tmp/a_games_log_visit_base/1_6_34.parquet",
> "/tmp/a_games_log_visit_base/1_1_5.parquet",
> "/tmp/a_games_log_visit_base/1_1_23.parquet",
> "/tmp/a_games_log_visit_base/1_6_6.parquet",
> "/tmp/a_games_log_visit_base/1_5_3.parquet",
> "/tmp/a_games_log_visit_base/1_0_11.parquet",
> "/tmp/a_games_log_visit_base/1_0_2.parquet",
> "/tmp/a_games_log_visit_base/1_4_10.parquet",
> "/tmp/a_games_log_visit_base/1_2_8.parquet",
> "/tmp/a_games_log_visit_base/1_0_19.parquet",
> "/tmp/a_games_log_visit_base/1_7_32.parquet",
> "/tmp/a_games_log_visit_base/1_8_29.parquet",
> "/tmp/a_games_log_visit_base/1_8_1.parquet",
> "/tmp/a_games_log_visit_base/1_6_20.parquet",
> "/tmp/a_games_log_visit_base/1_7_17.parquet",
> "/tmp/a_games_log_visit_base/1_3_0.parquet",
> "/tmp/a_games_log_visit_base/1_1_17.parquet",
> "/tmp/a_games_log_visit_base/1_4_18.parquet",
> "/tmp/a_games_log_visit_base/1_7_9.parquet",
> "/tmp/a_games_log_visit_base/1_8_32.parquet",
> "/tmp/a_games_log_visit_base/1_7_20.parquet", "/tmp/a_ |
> +------+------+
> 1 row selected (0.503 seconds)
> {noformat}
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)