Shankar created DRILL-4474:
------------------------------
Summary: Inconsistent behavior while using COUNT in select (Apache
drill 1.2.0)
Key: DRILL-4474
URL: https://issues.apache.org/jira/browse/DRILL-4474
Project: Apache Drill
Issue Type: Test
Affects Versions: 1.2.0
Environment: m3.xlarge AWS instances ( 3 nodes)
CentOS6.5 x64
Reporter: Shankar
{quote}
* we are using drill to retrieve the business data from game analytic.
* we are running below queries on table of size 50GB (parquet)
* We have found some major inconsistency in data when we use COUNT function.
* Below is the case by case queries and their output. {color:blue}*Please
analyse it carefully, to for clear understanding of behaviour. *{color}
* Please let me know how to resolve this ? (or any earlier JIRA has been
already created).
* Hope this may be fixed in later versions. If not please do the needful.
{quote}
------------------------------------------------------------------------------------------------------------------------------------------------------
CASE-1 (Wrong result)
------------------------------------------------------------------------------------------------------------------------------------------------------
{color:red}
{quote}
0: jdbc:drill:> select
. . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
'Click' then sessionid end) as cnt
. . . . . . . > from dfs.tmp.a_games_log_visit_base t
. . . . . . . > ;
+-----------+
| count |
+-----------+
| 27645752 |
+-----------+
1 row selected (0.281 seconds)
{quote}
------------------------------------------------------------------------------------------------------------------------------------------------------
CASE-2 (Wrong result)
------------------------------------------------------------------------------------------------------------------------------------------------------
{quote}
0: jdbc:drill:> select
. . . . . . . > count(sessionid),
. . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
'Click' then sessionid end) as cnt
. . . . . . . > from dfs.tmp.a_games_log_visit_base t
. . . . . . . > ;
+-----------+-------+
| EXPR$0 | cnt |
+-----------+-------+
| 37772844 | 2108 |
+-----------+-------+
1 row selected (12.597 seconds)
{quote}
------------------------------------------------------------------------------------------------------------------------------------------------------
CASE-3 (Wrong result, only first count is correct)
------------------------------------------------------------------------------------------------------------------------------------------------------
{quote}
0: jdbc:drill:> select
. . . . . . . > count(distinct sessionid),
. . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
'Click' then sessionid end) as cnt
. . . . . . . > from dfs.tmp.a_games_log_visit_base t
. . . . . . . > ;
+---------+-----------+
| EXPR$0 | cnt |
+---------+-----------+
| 201941 | 37772844 |
+---------+-----------+
1 row selected (8.259 seconds)
{quote}
{color}
{color:green}
------------------------------------------------------------------------------------------------------------------------------------------------------
CASE-4 (Correct result)
------------------------------------------------------------------------------------------------------------------------------------------------------
{quote}
0: jdbc:drill:> select
. . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and
t.event = 'Click' then sessionid end) as cnt
. . . . . . . > from dfs.tmp.a_games_log_visit_base t
. . . . . . . > ;
+------+
| cnt |
+------+
| 525 |
+------+
1 row selected (14.318 seconds)
{quote}
------------------------------------------------------------------------------------------------------------------------------------------------------
CASE-5 (Correct result)
------------------------------------------------------------------------------------------------------------------------------------------------------
{quote}
0: jdbc:drill:> select
. . . . . . . > count(sessionid),
. . . . . . . > count(distinct sessionid)
. . . . . . . > from dfs.tmp.a_games_log_visit_base t
. . . . . . . > where ( t.id = '/confirmDrop/btnYes/' and t.event = 'Click')
. . . . . . . > ;
+---------+---------+
| EXPR$0 | EXPR$1 |
+---------+---------+
| 2108 | 525 |
+---------+---------+
1 row selected (19.355 seconds)
{quote}
{color}
------------------------------------------------------------------------------------------------------------------------------------------------------
CASE-6
------------------------------------------------------------------------------------------------------------------------------------------------------
{quote}
0: jdbc:drill:> explain plan for
. . . . . . . >
. . . . . . . > select
. . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
'Click' then sessionid end) as cnt
. . . . . . . > from dfs.tmp.a_games_log_visit_base t
. . . . . . . > ;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@73ff10e1])
| {
"head" : {
"version" : 1,
"generator" : {
"type" : "ExplainHandler",
"info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"resultMode" : "EXEC"
},
"graph" : [ {
"pop" : "DirectGroupScan",
"@id" : 1,
"cost" : 20.0
}, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 1000000,
"maxAllocation" : 10000000000,
"cost" : 20.0
} ]
} |
+------+------+
1 row selected (0.276 seconds)
{quote}
------------------------------------------------------------------------------------------------------------------------------------------------------
CASE-7
------------------------------------------------------------------------------------------------------------------------------------------------------
{quote}
0: jdbc:drill:>
0: jdbc:drill:> explain plan for
. . . . . . . >
. . . . . . . > select
. . . . . . . > count(distinct sessionid),
. . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event =
'Click' then sessionid end) as cnt
. . . . . . . > from dfs.tmp.a_games_log_visit_base t
. . . . . . . > ;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(EXPR$0=[$0], cnt=[$1])
00-02 Project(EXPR$0=[$1], cnt=[$0])
00-03 NestedLoopJoin(condition=[true], joinType=[inner])
00-05
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@59b7cda9])
00-04 StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
00-06 UnionExchange
01-01 StreamAgg(group=[{}], EXPR$0=[COUNT($0)])
01-02 HashAgg(group=[{0}])
01-03 Project(sessionid=[$0])
01-04 HashToRandomExchange(dist0=[[$0]])
02-01 UnorderedMuxExchange
03-01 Project(sessionid=[$0],
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
03-02 HashAgg(group=[{0}])
03-03 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath
[path=hdfs://namenode:9000/tmp/a_games_log_visit_base]],
selectionRoot=hdfs://namenode:9000/tmp/a_games_log_visit_base, numFiles=1,
usedMetadataFile=false, columns=[`sessionid`]]])
| {
"head" : {
"version" : 1,
"generator" : {
"type" : "ExplainHandler",
"info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"resultMode" : "EXEC"
},
"graph" : [ {
"pop" : "DirectGroupScan",
"@id" : 5,
"cost" : 20.0
}, {
"pop" : "parquet-scan",
"@id" : 196611,
"userName" : "hadoop",
"entries" : [ {
"path" : "hdfs://namenode:9000/tmp/a_games_log_visit_base"
} ],
"storage" : {
"type" : "file",
"enabled" : true,
"connection" : "hdfs://namenode:9000",
"workspaces" : {
"root" : {
"location" : "/tmp/",
"writable" : true,
"defaultInputFormat" : null
},
"tmp" : {
"location" : "/tmp",
"writable" : true,
"defaultInputFormat" : null
}
},
"formats" : {
"psv" : {
"type" : "text",
"extensions" : [ "tbl" ],
"delimiter" : "|"
},
"csv" : {
"type" : "text",
"extensions" : [ "csv" ],
"delimiter" : ","
},
"tsv" : {
"type" : "text",
"extensions" : [ "tsv" ],
"delimiter" : "\t"
},
"parquet" : {
"type" : "parquet"
},
"json" : {
"type" : "json"
},
"avro" : {
"type" : "avro"
}
}
},
"format" : {
"type" : "parquet"
},
"columns" : [ "`sessionid`" ],
"selectionRoot" : "hdfs://namenode:9000/tmp/a_games_log_visit_base",
"fileSet" : [ "/tmp/a_games_log_visit_base/1_6_15.parquet",
"/tmp/a_games_log_visit_base/1_1_9.parquet",
"/tmp/a_games_log_visit_base/1_7_2.parquet",
"/tmp/a_games_log_visit_base/1_5_7.parquet",
"/tmp/a_games_log_visit_base/1_2_1.parquet",
"/tmp/a_games_log_visit_base/1_4_23.parquet",
"/tmp/a_games_log_visit_base/1_8_22.parquet",
"/tmp/a_games_log_visit_base/1_3_11.parquet",
"/tmp/a_games_log_visit_base/1_6_30.parquet",
"/tmp/a_games_log_visit_base/1_7_10.parquet",
"/tmp/a_games_log_visit_base/1_1_10.parquet",
"/tmp/a_games_log_visit_base/1_0_24.parquet",
"/tmp/a_games_log_visit_base/1_8_39.parquet",
"/tmp/a_games_log_visit_base/1_2_16.parquet",
"/tmp/a_games_log_visit_base/1_7_27.parquet",
"/tmp/a_games_log_visit_base/1_3_4.parquet",
"/tmp/a_games_log_visit_base/1_8_5.parquet",
"/tmp/a_games_log_visit_base/1_8_13.parquet",
"/tmp/a_games_log_visit_base/1_4_7.parquet",
"/tmp/a_games_log_visit_base/1_5_12.parquet",
"/tmp/a_games_log_visit_base/1_0_9.parquet",
"/tmp/a_games_log_visit_base/1_4_14.parquet",
"/tmp/a_games_log_visit_base/1_2_13.parquet",
"/tmp/a_games_log_visit_base/1_0_15.parquet",
"/tmp/a_games_log_visit_base/1_2_4.parquet",
"/tmp/a_games_log_visit_base/1_6_24.parquet",
"/tmp/a_games_log_visit_base/1_7_5.parquet",
"/tmp/a_games_log_visit_base/1_6_2.parquet",
"/tmp/a_games_log_visit_base/1_1_13.parquet",
"/tmp/a_games_log_visit_base/1_1_1.parquet",
"/tmp/a_games_log_visit_base/1_3_7.parquet",
"/tmp/a_games_log_visit_base/1_0_12.parquet",
"/tmp/a_games_log_visit_base/1_0_3.parquet",
"/tmp/a_games_log_visit_base/1_0_29.parquet",
"/tmp/a_games_log_visit_base/1_1_24.parquet",
"/tmp/a_games_log_visit_base/1_6_18.parquet",
"/tmp/a_games_log_visit_base/1_2_19.parquet",
"/tmp/a_games_log_visit_base/1_3_16.parquet",
"/tmp/a_games_log_visit_base/1_6_27.parquet",
"/tmp/a_games_log_visit_base/1_4_11.parquet",
"/tmp/a_games_log_visit_base/1_7_15.parquet",
"/tmp/a_games_log_visit_base/1_5_2.parquet",
"/tmp/a_games_log_visit_base/1_8_10.parquet",
"/tmp/a_games_log_visit_base/1_6_7.parquet",
"/tmp/a_games_log_visit_base/1_8_2.parquet",
"/tmp/a_games_log_visit_base/1_2_9.parquet",
"/tmp/a_games_log_visit_base/1_7_18.parquet",
"/tmp/a_games_log_visit_base/1_3_20.parquet",
"/tmp/a_games_log_visit_base/1_7_31.parquet",
"/tmp/a_games_log_visit_base/1_1_16.parquet",
"/tmp/a_games_log_visit_base/1_6_35.parquet",
"/tmp/a_games_log_visit_base/1_1_4.parquet",
"/tmp/a_games_log_visit_base/1_4_19.parquet",
"/tmp/a_games_log_visit_base/1_0_6.parquet",
"/tmp/a_games_log_visit_base/1_8_18.parquet",
"/tmp/a_games_log_visit_base/1_7_8.parquet",
"/tmp/a_games_log_visit_base/1_5_15.parquet",
"/tmp/a_games_log_visit_base/1_4_20.parquet",
"/tmp/a_games_log_visit_base/1_0_21.parquet",
"/tmp/a_games_log_visit_base/1_8_31.parquet",
"/tmp/a_games_log_visit_base/1_4_4.parquet",
"/tmp/a_games_log_visit_base/1_3_19.parquet",
"/tmp/a_games_log_visit_base/1_1_21.parquet",
"/tmp/a_games_log_visit_base/1_2_18.parquet",
"/tmp/a_games_log_visit_base/1_8_20.parquet",
"/tmp/a_games_log_visit_base/1_8_37.parquet",
"/tmp/a_games_log_visit_base/1_0_13.parquet",
"/tmp/a_games_log_visit_base/1_0_4.parquet",
"/tmp/a_games_log_visit_base/1_6_13.parquet",
"/tmp/a_games_log_visit_base/1_0_22.parquet",
"/tmp/a_games_log_visit_base/1_1_19.parquet",
"/tmp/a_games_log_visit_base/1_3_2.parquet",
"/tmp/a_games_log_visit_base/1_7_29.parquet",
"/tmp/a_games_log_visit_base/1_5_1.parquet",
"/tmp/a_games_log_visit_base/1_8_11.parquet",
"/tmp/a_games_log_visit_base/1_4_25.parquet",
"/tmp/a_games_log_visit_base/1_6_32.parquet",
"/tmp/a_games_log_visit_base/1_7_12.parquet",
"/tmp/a_games_log_visit_base/1_7_0.parquet",
"/tmp/a_games_log_visit_base/1_6_4.parquet",
"/tmp/a_games_log_visit_base/1_6_22.parquet",
"/tmp/a_games_log_visit_base/1_4_5.parquet",
"/tmp/a_games_log_visit_base/1_8_3.parquet",
"/tmp/a_games_log_visit_base/1_1_11.parquet",
"/tmp/a_games_log_visit_base/1_1_3.parquet",
"/tmp/a_games_log_visit_base/1_8_34.parquet",
"/tmp/a_games_log_visit_base/1_2_6.parquet",
"/tmp/a_games_log_visit_base/1_5_14.parquet",
"/tmp/a_games_log_visit_base/1_0_7.parquet",
"/tmp/a_games_log_visit_base/1_4_16.parquet",
"/tmp/a_games_log_visit_base/1_6_10.parquet",
"/tmp/a_games_log_visit_base/1_7_3.parquet",
"/tmp/a_games_log_visit_base/1_0_27.parquet",
"/tmp/a_games_log_visit_base/1_0_1.parquet",
"/tmp/a_games_log_visit_base/1_0_10.parquet",
"/tmp/a_games_log_visit_base/1_5_4.parquet",
"/tmp/a_games_log_visit_base/1_8_19.parquet",
"/tmp/a_games_log_visit_base/1_4_22.parquet",
"/tmp/a_games_log_visit_base/1_7_24.parquet",
"/tmp/a_games_log_visit_base/1_8_8.parquet",
"/tmp/a_games_log_visit_base/1_8_25.parquet",
"/tmp/a_games_log_visit_base/1_6_9.parquet",
"/tmp/a_games_log_visit_base/1_5_17.parquet",
"/tmp/a_games_log_visit_base/1_3_5.parquet",
"/tmp/a_games_log_visit_base/1_7_30.parquet",
"/tmp/a_games_log_visit_base/1_3_14.parquet",
"/tmp/a_games_log_visit_base/1_1_6.parquet",
"/tmp/a_games_log_visit_base/1_6_16.parquet",
"/tmp/a_games_log_visit_base/1_2_0.parquet",
"/tmp/a_games_log_visit_base/1_5_21.parquet",
"/tmp/a_games_log_visit_base/1_1_14.parquet",
"/tmp/a_games_log_visit_base/1_7_33.parquet",
"/tmp/a_games_log_visit_base/1_8_28.parquet",
"/tmp/a_games_log_visit_base/1_0_18.parquet",
"/tmp/a_games_log_visit_base/1_4_13.parquet",
"/tmp/a_games_log_visit_base/1_7_21.parquet",
"/tmp/a_games_log_visit_base/1_2_3.parquet",
"/tmp/a_games_log_visit_base/1_3_17.parquet",
"/tmp/a_games_log_visit_base/1_4_2.parquet",
"/tmp/a_games_log_visit_base/1_8_16.parquet",
"/tmp/a_games_log_visit_base/1_6_25.parquet",
"/tmp/a_games_log_visit_base/1_8_0.parquet",
"/tmp/a_games_log_visit_base/1_2_10.parquet",
"/tmp/a_games_log_visit_base/1_6_1.parquet",
"/tmp/a_games_log_visit_base/1_7_6.parquet",
"/tmp/a_games_log_visit_base/1_3_8.parquet",
"/tmp/a_games_log_visit_base/1_8_9.parquet",
"/tmp/a_games_log_visit_base/1_3_15.parquet",
"/tmp/a_games_log_visit_base/1_7_23.parquet",
"/tmp/a_games_log_visit_base/1_6_28.parquet",
"/tmp/a_games_log_visit_base/1_4_0.parquet",
"/tmp/a_games_log_visit_base/1_4_27.parquet",
"/tmp/a_games_log_visit_base/1_8_35.parquet",
"/tmp/a_games_log_visit_base/1_6_19.parquet",
"/tmp/a_games_log_visit_base/1_0_28.parquet",
"/tmp/a_games_log_visit_base/1_5_16.parquet",
"/tmp/a_games_log_visit_base/1_7_14.parquet",
"/tmp/a_games_log_visit_base/1_5_22.parquet",
"/tmp/a_games_log_visit_base/1_8_26.parquet",
"/tmp/a_games_log_visit_base/1_6_34.parquet",
"/tmp/a_games_log_visit_base/1_1_5.parquet",
"/tmp/a_games_log_visit_base/1_1_23.parquet",
"/tmp/a_games_log_visit_base/1_6_6.parquet",
"/tmp/a_games_log_visit_base/1_5_3.parquet",
"/tmp/a_games_log_visit_base/1_0_11.parquet",
"/tmp/a_games_log_visit_base/1_0_2.parquet",
"/tmp/a_games_log_visit_base/1_4_10.parquet",
"/tmp/a_games_log_visit_base/1_2_8.parquet",
"/tmp/a_games_log_visit_base/1_0_19.parquet",
"/tmp/a_games_log_visit_base/1_7_32.parquet",
"/tmp/a_games_log_visit_base/1_8_29.parquet",
"/tmp/a_games_log_visit_base/1_8_1.parquet",
"/tmp/a_games_log_visit_base/1_6_20.parquet",
"/tmp/a_games_log_visit_base/1_7_17.parquet",
"/tmp/a_games_log_visit_base/1_3_0.parquet",
"/tmp/a_games_log_visit_base/1_1_17.parquet",
"/tmp/a_games_log_visit_base/1_4_18.parquet",
"/tmp/a_games_log_visit_base/1_7_9.parquet",
"/tmp/a_games_log_visit_base/1_8_32.parquet",
"/tmp/a_games_log_visit_base/1_7_20.parquet", "/tmp/a_ |
+------+------+
1 row selected (0.503 seconds)
{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)