Shankar created DRILL-4474: ------------------------------ Summary: Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) Key: DRILL-4474 URL: https://issues.apache.org/jira/browse/DRILL-4474 Project: Apache Drill Issue Type: Test Affects Versions: 1.2.0 Environment: m3.xlarge AWS instances ( 3 nodes) CentOS6.5 x64 Reporter: Shankar
{quote} * we are using drill to retrieve the business data from game analytic. * we are running below queries on table of size 50GB (parquet) * We have found some major inconsistency in data when we use COUNT function. * Below is the case by case queries and their output. {color:blue}*Please analyse it carefully, to for clear understanding of behaviour. *{color} * Please let me know how to resolve this ? (or any earlier JIRA has been already created). * Hope this may be fixed in later versions. If not please do the needful. {quote} ------------------------------------------------------------------------------------------------------------------------------------------------------ CASE-1 (Wrong result) ------------------------------------------------------------------------------------------------------------------------------------------------------ {color:red} {quote} 0: jdbc:drill:> select . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt . . . . . . . > from dfs.tmp.a_games_log_visit_base t . . . . . . . > ; +-----------+ | count | +-----------+ | 27645752 | +-----------+ 1 row selected (0.281 seconds) {quote} ------------------------------------------------------------------------------------------------------------------------------------------------------ CASE-2 (Wrong result) ------------------------------------------------------------------------------------------------------------------------------------------------------ {quote} 0: jdbc:drill:> select . . . . . . . > count(sessionid), . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt . . . . . . . > from dfs.tmp.a_games_log_visit_base t . . . . . . . > ; +-----------+-------+ | EXPR$0 | cnt | +-----------+-------+ | 37772844 | 2108 | +-----------+-------+ 1 row selected (12.597 seconds) {quote} ------------------------------------------------------------------------------------------------------------------------------------------------------ CASE-3 (Wrong result, only first count is correct) ------------------------------------------------------------------------------------------------------------------------------------------------------ {quote} 0: jdbc:drill:> select . . . . . . . > count(distinct sessionid), . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt . . . . . . . > from dfs.tmp.a_games_log_visit_base t . . . . . . . > ; +---------+-----------+ | EXPR$0 | cnt | +---------+-----------+ | 201941 | 37772844 | +---------+-----------+ 1 row selected (8.259 seconds) {quote} {color} {color:green} ------------------------------------------------------------------------------------------------------------------------------------------------------ CASE-4 (Correct result) ------------------------------------------------------------------------------------------------------------------------------------------------------ {quote} 0: jdbc:drill:> select . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt . . . . . . . > from dfs.tmp.a_games_log_visit_base t . . . . . . . > ; +------+ | cnt | +------+ | 525 | +------+ 1 row selected (14.318 seconds) {quote} ------------------------------------------------------------------------------------------------------------------------------------------------------ CASE-5 (Correct result) ------------------------------------------------------------------------------------------------------------------------------------------------------ {quote} 0: jdbc:drill:> select . . . . . . . > count(sessionid), . . . . . . . > count(distinct sessionid) . . . . . . . > from dfs.tmp.a_games_log_visit_base t . . . . . . . > where ( t.id = '/confirmDrop/btnYes/' and t.event = 'Click') . . . . . . . > ; +---------+---------+ | EXPR$0 | EXPR$1 | +---------+---------+ | 2108 | 525 | +---------+---------+ 1 row selected (19.355 seconds) {quote} {color} ------------------------------------------------------------------------------------------------------------------------------------------------------ CASE-6 ------------------------------------------------------------------------------------------------------------------------------------------------------ {quote} 0: jdbc:drill:> explain plan for . . . . . . . > . . . . . . . > select . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt . . . . . . . > from dfs.tmp.a_games_log_visit_base t . . . . . . . > ; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@73ff10e1]) | { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ ], "queue" : 0, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "DirectGroupScan", "@id" : 1, "cost" : 20.0 }, { "pop" : "screen", "@id" : 0, "child" : 1, "initialAllocation" : 1000000, "maxAllocation" : 10000000000, "cost" : 20.0 } ] } | +------+------+ 1 row selected (0.276 seconds) {quote} ------------------------------------------------------------------------------------------------------------------------------------------------------ CASE-7 ------------------------------------------------------------------------------------------------------------------------------------------------------ {quote} 0: jdbc:drill:> 0: jdbc:drill:> explain plan for . . . . . . . > . . . . . . . > select . . . . . . . > count(distinct sessionid), . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = 'Click' then sessionid end) as cnt . . . . . . . > from dfs.tmp.a_games_log_visit_base t . . . . . . . > ; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(EXPR$0=[$0], cnt=[$1]) 00-02 Project(EXPR$0=[$1], cnt=[$0]) 00-03 NestedLoopJoin(condition=[true], joinType=[inner]) 00-05 Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@59b7cda9]) 00-04 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) 00-06 UnionExchange 01-01 StreamAgg(group=[{}], EXPR$0=[COUNT($0)]) 01-02 HashAgg(group=[{0}]) 01-03 Project(sessionid=[$0]) 01-04 HashToRandomExchange(dist0=[[$0]]) 02-01 UnorderedMuxExchange 03-01 Project(sessionid=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) 03-02 HashAgg(group=[{0}]) 03-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=hdfs://namenode:9000/tmp/a_games_log_visit_base]], selectionRoot=hdfs://namenode:9000/tmp/a_games_log_visit_base, numFiles=1, usedMetadataFile=false, columns=[`sessionid`]]]) | { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ ], "queue" : 0, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "DirectGroupScan", "@id" : 5, "cost" : 20.0 }, { "pop" : "parquet-scan", "@id" : 196611, "userName" : "hadoop", "entries" : [ { "path" : "hdfs://namenode:9000/tmp/a_games_log_visit_base" } ], "storage" : { "type" : "file", "enabled" : true, "connection" : "hdfs://namenode:9000", "workspaces" : { "root" : { "location" : "/tmp/", "writable" : true, "defaultInputFormat" : null }, "tmp" : { "location" : "/tmp", "writable" : true, "defaultInputFormat" : null } }, "formats" : { "psv" : { "type" : "text", "extensions" : [ "tbl" ], "delimiter" : "|" }, "csv" : { "type" : "text", "extensions" : [ "csv" ], "delimiter" : "," }, "tsv" : { "type" : "text", "extensions" : [ "tsv" ], "delimiter" : "\t" }, "parquet" : { "type" : "parquet" }, "json" : { "type" : "json" }, "avro" : { "type" : "avro" } } }, "format" : { "type" : "parquet" }, "columns" : [ "`sessionid`" ], "selectionRoot" : "hdfs://namenode:9000/tmp/a_games_log_visit_base", "fileSet" : [ "/tmp/a_games_log_visit_base/1_6_15.parquet", "/tmp/a_games_log_visit_base/1_1_9.parquet", "/tmp/a_games_log_visit_base/1_7_2.parquet", "/tmp/a_games_log_visit_base/1_5_7.parquet", "/tmp/a_games_log_visit_base/1_2_1.parquet", "/tmp/a_games_log_visit_base/1_4_23.parquet", "/tmp/a_games_log_visit_base/1_8_22.parquet", "/tmp/a_games_log_visit_base/1_3_11.parquet", "/tmp/a_games_log_visit_base/1_6_30.parquet", "/tmp/a_games_log_visit_base/1_7_10.parquet", "/tmp/a_games_log_visit_base/1_1_10.parquet", "/tmp/a_games_log_visit_base/1_0_24.parquet", "/tmp/a_games_log_visit_base/1_8_39.parquet", "/tmp/a_games_log_visit_base/1_2_16.parquet", "/tmp/a_games_log_visit_base/1_7_27.parquet", "/tmp/a_games_log_visit_base/1_3_4.parquet", "/tmp/a_games_log_visit_base/1_8_5.parquet", "/tmp/a_games_log_visit_base/1_8_13.parquet", "/tmp/a_games_log_visit_base/1_4_7.parquet", "/tmp/a_games_log_visit_base/1_5_12.parquet", "/tmp/a_games_log_visit_base/1_0_9.parquet", "/tmp/a_games_log_visit_base/1_4_14.parquet", "/tmp/a_games_log_visit_base/1_2_13.parquet", "/tmp/a_games_log_visit_base/1_0_15.parquet", "/tmp/a_games_log_visit_base/1_2_4.parquet", "/tmp/a_games_log_visit_base/1_6_24.parquet", "/tmp/a_games_log_visit_base/1_7_5.parquet", "/tmp/a_games_log_visit_base/1_6_2.parquet", "/tmp/a_games_log_visit_base/1_1_13.parquet", "/tmp/a_games_log_visit_base/1_1_1.parquet", "/tmp/a_games_log_visit_base/1_3_7.parquet", "/tmp/a_games_log_visit_base/1_0_12.parquet", "/tmp/a_games_log_visit_base/1_0_3.parquet", "/tmp/a_games_log_visit_base/1_0_29.parquet", "/tmp/a_games_log_visit_base/1_1_24.parquet", "/tmp/a_games_log_visit_base/1_6_18.parquet", "/tmp/a_games_log_visit_base/1_2_19.parquet", "/tmp/a_games_log_visit_base/1_3_16.parquet", "/tmp/a_games_log_visit_base/1_6_27.parquet", "/tmp/a_games_log_visit_base/1_4_11.parquet", "/tmp/a_games_log_visit_base/1_7_15.parquet", "/tmp/a_games_log_visit_base/1_5_2.parquet", "/tmp/a_games_log_visit_base/1_8_10.parquet", "/tmp/a_games_log_visit_base/1_6_7.parquet", "/tmp/a_games_log_visit_base/1_8_2.parquet", "/tmp/a_games_log_visit_base/1_2_9.parquet", "/tmp/a_games_log_visit_base/1_7_18.parquet", "/tmp/a_games_log_visit_base/1_3_20.parquet", "/tmp/a_games_log_visit_base/1_7_31.parquet", "/tmp/a_games_log_visit_base/1_1_16.parquet", "/tmp/a_games_log_visit_base/1_6_35.parquet", "/tmp/a_games_log_visit_base/1_1_4.parquet", "/tmp/a_games_log_visit_base/1_4_19.parquet", "/tmp/a_games_log_visit_base/1_0_6.parquet", "/tmp/a_games_log_visit_base/1_8_18.parquet", "/tmp/a_games_log_visit_base/1_7_8.parquet", "/tmp/a_games_log_visit_base/1_5_15.parquet", "/tmp/a_games_log_visit_base/1_4_20.parquet", "/tmp/a_games_log_visit_base/1_0_21.parquet", "/tmp/a_games_log_visit_base/1_8_31.parquet", "/tmp/a_games_log_visit_base/1_4_4.parquet", "/tmp/a_games_log_visit_base/1_3_19.parquet", "/tmp/a_games_log_visit_base/1_1_21.parquet", "/tmp/a_games_log_visit_base/1_2_18.parquet", "/tmp/a_games_log_visit_base/1_8_20.parquet", "/tmp/a_games_log_visit_base/1_8_37.parquet", "/tmp/a_games_log_visit_base/1_0_13.parquet", "/tmp/a_games_log_visit_base/1_0_4.parquet", "/tmp/a_games_log_visit_base/1_6_13.parquet", "/tmp/a_games_log_visit_base/1_0_22.parquet", "/tmp/a_games_log_visit_base/1_1_19.parquet", "/tmp/a_games_log_visit_base/1_3_2.parquet", "/tmp/a_games_log_visit_base/1_7_29.parquet", "/tmp/a_games_log_visit_base/1_5_1.parquet", "/tmp/a_games_log_visit_base/1_8_11.parquet", "/tmp/a_games_log_visit_base/1_4_25.parquet", "/tmp/a_games_log_visit_base/1_6_32.parquet", "/tmp/a_games_log_visit_base/1_7_12.parquet", "/tmp/a_games_log_visit_base/1_7_0.parquet", "/tmp/a_games_log_visit_base/1_6_4.parquet", "/tmp/a_games_log_visit_base/1_6_22.parquet", "/tmp/a_games_log_visit_base/1_4_5.parquet", "/tmp/a_games_log_visit_base/1_8_3.parquet", "/tmp/a_games_log_visit_base/1_1_11.parquet", "/tmp/a_games_log_visit_base/1_1_3.parquet", "/tmp/a_games_log_visit_base/1_8_34.parquet", "/tmp/a_games_log_visit_base/1_2_6.parquet", "/tmp/a_games_log_visit_base/1_5_14.parquet", "/tmp/a_games_log_visit_base/1_0_7.parquet", "/tmp/a_games_log_visit_base/1_4_16.parquet", "/tmp/a_games_log_visit_base/1_6_10.parquet", "/tmp/a_games_log_visit_base/1_7_3.parquet", "/tmp/a_games_log_visit_base/1_0_27.parquet", "/tmp/a_games_log_visit_base/1_0_1.parquet", "/tmp/a_games_log_visit_base/1_0_10.parquet", "/tmp/a_games_log_visit_base/1_5_4.parquet", "/tmp/a_games_log_visit_base/1_8_19.parquet", "/tmp/a_games_log_visit_base/1_4_22.parquet", "/tmp/a_games_log_visit_base/1_7_24.parquet", "/tmp/a_games_log_visit_base/1_8_8.parquet", "/tmp/a_games_log_visit_base/1_8_25.parquet", "/tmp/a_games_log_visit_base/1_6_9.parquet", "/tmp/a_games_log_visit_base/1_5_17.parquet", "/tmp/a_games_log_visit_base/1_3_5.parquet", "/tmp/a_games_log_visit_base/1_7_30.parquet", "/tmp/a_games_log_visit_base/1_3_14.parquet", "/tmp/a_games_log_visit_base/1_1_6.parquet", "/tmp/a_games_log_visit_base/1_6_16.parquet", "/tmp/a_games_log_visit_base/1_2_0.parquet", "/tmp/a_games_log_visit_base/1_5_21.parquet", "/tmp/a_games_log_visit_base/1_1_14.parquet", "/tmp/a_games_log_visit_base/1_7_33.parquet", "/tmp/a_games_log_visit_base/1_8_28.parquet", "/tmp/a_games_log_visit_base/1_0_18.parquet", "/tmp/a_games_log_visit_base/1_4_13.parquet", "/tmp/a_games_log_visit_base/1_7_21.parquet", "/tmp/a_games_log_visit_base/1_2_3.parquet", "/tmp/a_games_log_visit_base/1_3_17.parquet", "/tmp/a_games_log_visit_base/1_4_2.parquet", "/tmp/a_games_log_visit_base/1_8_16.parquet", "/tmp/a_games_log_visit_base/1_6_25.parquet", "/tmp/a_games_log_visit_base/1_8_0.parquet", "/tmp/a_games_log_visit_base/1_2_10.parquet", "/tmp/a_games_log_visit_base/1_6_1.parquet", "/tmp/a_games_log_visit_base/1_7_6.parquet", "/tmp/a_games_log_visit_base/1_3_8.parquet", "/tmp/a_games_log_visit_base/1_8_9.parquet", "/tmp/a_games_log_visit_base/1_3_15.parquet", "/tmp/a_games_log_visit_base/1_7_23.parquet", "/tmp/a_games_log_visit_base/1_6_28.parquet", "/tmp/a_games_log_visit_base/1_4_0.parquet", "/tmp/a_games_log_visit_base/1_4_27.parquet", "/tmp/a_games_log_visit_base/1_8_35.parquet", "/tmp/a_games_log_visit_base/1_6_19.parquet", "/tmp/a_games_log_visit_base/1_0_28.parquet", "/tmp/a_games_log_visit_base/1_5_16.parquet", "/tmp/a_games_log_visit_base/1_7_14.parquet", "/tmp/a_games_log_visit_base/1_5_22.parquet", "/tmp/a_games_log_visit_base/1_8_26.parquet", "/tmp/a_games_log_visit_base/1_6_34.parquet", "/tmp/a_games_log_visit_base/1_1_5.parquet", "/tmp/a_games_log_visit_base/1_1_23.parquet", "/tmp/a_games_log_visit_base/1_6_6.parquet", "/tmp/a_games_log_visit_base/1_5_3.parquet", "/tmp/a_games_log_visit_base/1_0_11.parquet", "/tmp/a_games_log_visit_base/1_0_2.parquet", "/tmp/a_games_log_visit_base/1_4_10.parquet", "/tmp/a_games_log_visit_base/1_2_8.parquet", "/tmp/a_games_log_visit_base/1_0_19.parquet", "/tmp/a_games_log_visit_base/1_7_32.parquet", "/tmp/a_games_log_visit_base/1_8_29.parquet", "/tmp/a_games_log_visit_base/1_8_1.parquet", "/tmp/a_games_log_visit_base/1_6_20.parquet", "/tmp/a_games_log_visit_base/1_7_17.parquet", "/tmp/a_games_log_visit_base/1_3_0.parquet", "/tmp/a_games_log_visit_base/1_1_17.parquet", "/tmp/a_games_log_visit_base/1_4_18.parquet", "/tmp/a_games_log_visit_base/1_7_9.parquet", "/tmp/a_games_log_visit_base/1_8_32.parquet", "/tmp/a_games_log_visit_base/1_7_20.parquet", "/tmp/a_ | +------+------+ 1 row selected (0.503 seconds) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)