[
https://issues.apache.org/jira/browse/DRILL-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136759#comment-15136759
]
Khurram Faraaz commented on DRILL-4321:
---------------------------------------
here is the full JSON profile of the query that gives different results.
test :
Functional/aggregates/aggregation/count_distinct/with_min_max_c_float_group_by_1_cols.sql
Query : select count(distinct c_float), max(c_float), min(c_float) from
alltypes_with_nulls group by c_date order by c_date
Full JSON profile
{noformat}
{
"id": {
"part1": 2974517851546440000,
"part2": 8109210720415190000
},
"type": 1,
"start": 1454924741833,
"end": 1454924742416,
"query": "select count(distinct c_float), max(c_float), min(c_float) from
alltypes_with_nulls group by c_date order by c_date",
"plan": "00-00 Screen : rowType = RecordType(BIGINT EXPR$0, ANY EXPR$1,
ANY EXPR$2): rowcount = 10.0, cumulative cost = {444.0 rows, 5588.877123795494
cpu, 0.0 io, 0.0 network, 4832.0 memory}, id = 4499875\n00-01
Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2]) : rowType = RecordType(BIGINT
EXPR$0, ANY EXPR$1, ANY EXPR$2): rowcount = 10.0, cumulative cost = {443.0
rows, 5587.877123795494 cpu, 0.0 io, 0.0 network, 4832.0 memory}, id =
4499874\n00-02 Project(EXPR$0=[$4], EXPR$1=[$1], EXPR$2=[$2],
c_date=[$0]) : rowType = RecordType(BIGINT EXPR$0, ANY EXPR$1, ANY EXPR$2, ANY
c_date): rowcount = 10.0, cumulative cost = {443.0 rows, 5587.877123795494 cpu,
0.0 io, 0.0 network, 4832.0 memory}, id = 4499873\n00-03
MergeJoin(condition=[IS NOT DISTINCT FROM($0, $3)], joinType=[inner]) : rowType
= RecordType(ANY c_date, ANY EXPR$1, ANY EXPR$2, ANY c_date0, BIGINT EXPR$0):
rowcount = 10.0, cumulative cost = {443.0 rows, 5587.877123795494 cpu, 0.0 io,
0.0 network, 4832.0 memory}, id = 4499872\n00-05
SelectionVectorRemover : rowType = RecordType(ANY c_date, ANY EXPR$1, ANY
EXPR$2): rowcount = 10.0, cumulative cost = {220.0 rows, 3542.8771237954943
cpu, 0.0 io, 0.0 network, 2000.0000000000002 memory}, id = 4499865\n00-07
Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY c_date, ANY
EXPR$1, ANY EXPR$2): rowcount = 10.0, cumulative cost = {210.0 rows,
3532.8771237954943 cpu, 0.0 io, 0.0 network, 2000.0000000000002 memory}, id =
4499864\n00-09 HashAgg(group=[{0}], EXPR$1=[MAX($1)],
EXPR$2=[MIN($1)]) : rowType = RecordType(ANY c_date, ANY EXPR$1, ANY EXPR$2):
rowcount = 10.0, cumulative cost = {200.0 rows, 3400.0 cpu, 0.0 io, 0.0
network, 1760.0000000000002 memory}, id = 4499863\n00-11
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///drill/testdata/aggregation/alltypes_with_nulls]],
selectionRoot=maprfs:/drill/testdata/aggregation/alltypes_with_nulls,
numFiles=1, usedMetadataFile=false, columns=[`c_date`, `c_float`]]]) : rowType
= RecordType(ANY c_date, ANY c_float): rowcount = 100.0, cumulative cost =
{100.0 rows, 200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4499862\n00-04
Project(c_date0=[$0], EXPR$0=[$1]) : rowType = RecordType(ANY c_date0,
BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {212.0 rows, 2001.0 cpu, 0.0
io, 0.0 network, 2832.0 memory}, id = 4499871\n00-06
SelectionVectorRemover : rowType = RecordType(ANY c_date, BIGINT EXPR$0):
rowcount = 1.0, cumulative cost = {212.0 rows, 2001.0 cpu, 0.0 io, 0.0 network,
2832.0 memory}, id = 4499870\n00-08 Sort(sort0=[$0], dir0=[ASC])
: rowType = RecordType(ANY c_date, BIGINT EXPR$0): rowcount = 1.0, cumulative
cost = {211.0 rows, 2000.0 cpu, 0.0 io, 0.0 network, 2832.0 memory}, id =
4499869\n00-10 HashAgg(group=[{0}], EXPR$0=[COUNT($1)]) :
rowType = RecordType(ANY c_date, BIGINT EXPR$0): rowcount = 1.0, cumulative
cost = {210.0 rows, 2000.0 cpu, 0.0 io, 0.0 network, 2816.0 memory}, id =
4499868\n00-12 HashAgg(group=[{0, 1}]) : rowType =
RecordType(ANY c_date, ANY c_float): rowcount = 10.0, cumulative cost = {200.0
rows, 1800.0 cpu, 0.0 io, 0.0 network, 2640.0 memory}, id = 4499867\n00-13
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///drill/testdata/aggregation/alltypes_with_nulls]],
selectionRoot=maprfs:/drill/testdata/aggregation/alltypes_with_nulls,
numFiles=1, usedMetadataFile=false, columns=[`c_date`, `c_float`]]]) : rowType
= RecordType(ANY c_date, ANY c_float): rowcount = 100.0, cumulative cost =
{100.0 rows, 200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 4499866\n",
"foreman": {
"address": "centos-04.qa.lab",
"userPort": 31010,
"controlPort": 31011,
"dataPort": 31012
},
"state": 2,
"totalFragments": 1,
"finishedFragments": 0,
"fragmentProfile": [
{
"majorFragmentId": 0,
"minorFragmentProfile": [
{
"state": 3,
"minorFragmentId": 0,
"operatorProfile": [
{
"inputProfile": [
{
"records": 100,
"batches": 1,
"schemas": 1
}
],
"operatorId": 11,
"operatorType": 21,
"setupNanos": 0,
"processNanos": 1199300,
"peakLocalMemoryAllocated": 3328,
"waitNanos": 404338
},
{
"inputProfile": [
{
"records": 100,
"batches": 1,
"schemas": 1
}
],
"operatorId": 9,
"operatorType": 3,
"setupNanos": 86644364,
"processNanos": 79392986,
"peakLocalMemoryAllocated": 3015936,
"metric": [
{
"metricId": 0,
"longValue": 65536
},
{
"metricId": 2,
"longValue": 0
},
{
"metricId": 1,
"longValue": 70
},
{
"metricId": 3,
"longValue": 0
}
],
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 1
}
],
"operatorId": 7,
"operatorType": 17,
"setupNanos": 0,
"processNanos": 2027130,
"peakLocalMemoryAllocated": 10657664,
"metric": [
{
"metricId": 2,
"longValue": 1
}
],
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 2
}
],
"operatorId": 5,
"operatorType": 14,
"setupNanos": 13539290,
"processNanos": 452781,
"peakLocalMemoryAllocated": 77824,
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 100,
"batches": 1,
"schemas": 1
}
],
"operatorId": 13,
"operatorType": 21,
"setupNanos": 0,
"processNanos": 848316,
"peakLocalMemoryAllocated": 3328,
"waitNanos": 540899
},
{
"inputProfile": [
{
"records": 100,
"batches": 1,
"schemas": 1
}
],
"operatorId": 12,
"operatorType": 3,
"setupNanos": 54747060,
"processNanos": 9265289,
"peakLocalMemoryAllocated": 1835008,
"metric": [
{
"metricId": 0,
"longValue": 65536
},
{
"metricId": 2,
"longValue": 0
},
{
"metricId": 1,
"longValue": 100
},
{
"metricId": 3,
"longValue": 0
}
],
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 100,
"batches": 2,
"schemas": 1
}
],
"operatorId": 10,
"operatorType": 3,
"setupNanos": 31256696,
"processNanos": 20199873,
"peakLocalMemoryAllocated": 1967104,
"metric": [
{
"metricId": 0,
"longValue": 65536
},
{
"metricId": 2,
"longValue": 0
},
{
"metricId": 1,
"longValue": 70
},
{
"metricId": 3,
"longValue": 0
}
],
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 1
}
],
"operatorId": 8,
"operatorType": 17,
"setupNanos": 0,
"processNanos": 2049647,
"peakLocalMemoryAllocated": 10657408,
"metric": [
{
"metricId": 2,
"longValue": 1
}
],
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 2
}
],
"operatorId": 6,
"operatorType": 14,
"setupNanos": 11824877,
"processNanos": 419656,
"peakLocalMemoryAllocated": 69632,
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 2
}
],
"operatorId": 4,
"operatorType": 10,
"setupNanos": 205129,
"processNanos": 22102,
"peakLocalMemoryAllocated": 69632,
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 2
},
{
"records": 70,
"batches": 2,
"schemas": 1
}
],
"operatorId": 3,
"operatorType": 5,
"setupNanos": 21716244,
"processNanos": 1223759,
"peakLocalMemoryAllocated": 2363904,
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 2
}
],
"operatorId": 2,
"operatorType": 10,
"setupNanos": 234101,
"processNanos": 43473,
"peakLocalMemoryAllocated": 1769472,
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 1
}
],
"operatorId": 1,
"operatorType": 10,
"setupNanos": 71897,
"processNanos": 21247,
"peakLocalMemoryAllocated": 1179648,
"waitNanos": 0
},
{
"inputProfile": [
{
"records": 70,
"batches": 2,
"schemas": 1
}
],
"operatorId": 0,
"operatorType": 13,
"setupNanos": 0,
"processNanos": 74316,
"peakLocalMemoryAllocated": 1179648,
"metric": [
{
"metricId": 0,
"longValue": 1260
}
],
"waitNanos": 1127511
}
],
"startTime": 1454924742036,
"endTime": 1454924742410,
"memoryUsed": 0,
"maxMemoryUsed": 54015936,
"endpoint": {
"address": "centos-04.qa.lab",
"userPort": 31010,
"controlPort": 31011,
"dataPort": 31012
},
"lastUpdate": 1454924742410,
"lastProgress": 1454924742410
}
]
}
],
"user": "mapr"
}
{noformat}
> Difference in results count distinct with min max query on JDK8
> ---------------------------------------------------------------
>
> Key: DRILL-4321
> URL: https://issues.apache.org/jira/browse/DRILL-4321
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.4.0
> Environment: 4 node cluster
> Reporter: Khurram Faraaz
> Assignee: Deneche A. Hakim
> Labels: JDK8SUPPORT
> Attachments: expected_results.res
>
>
> count distinct query with min max and group by and order by returns incorrect
> results on MapR Drill 1.4.0, MapR FS 5.0.0 GA and JDK8
> The difference is in the way we round off values after the decimal when using
> JDK8.
> Expected results file can be found here
> https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/aggregates/aggregation/count_distinct/with_min_max_c_float_group_by_1_cols.res
> Failing query is Functional/aggregates/aggregation/count_distinct/
> with_min_max_c_float_group_by_1_cols.sql
> {noformat}
> select count(distinct c_float), max(c_float), min(c_float) from
> alltypes_with_nulls group by c_date order by c_date;
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)