Tim Armstrong has posted comments on this change.
Change subject: IMPALA-2581: LIMIT can be propagated down into some aggregations
......................................................................
Patch Set 4: -Code-Review
I'm seeing some odd behaviour playing around with this. It looks like the
streaming aggregation is still processing its full input, so e.g. select
distinct * from tpch_20_parquet.lineitem limit 10 takes a while. I think what's
happening is that it doesn't return eos to the DataStreamSender, which then
keeps repeatedly calling GetNext().
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est.
Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------
04:EXCHANGE 1 38.626us 38.626us 10 10 0
-1.00 B UNPARTITIONED
03:AGGREGATE 3 54.343ms 57.428ms 28 10 2.48 MB
10.00 MB FINALIZE
02:EXCHANGE 3 44.995us 54.989us 28 10 0
0 HASH(tpch_20_parquet.lineit...
01:AGGREGATE 3 33s149ms 33s383ms 30 10 5.00 MB
10.00 MB STREAMING
00:SCAN HDFS 3 16s484ms 21s155ms 119.99M 119.99M 1.35 GB
1.38 GB tpch_20_parquet.lineitem
--
To view, visit http://gerrit.cloudera.org:8080/3822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I59c5b7af7a73ccdbc5496b28eacb9b6859d202bc
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: No