kgyrtkirk commented on a change in pull request #2656:
URL: https://github.com/apache/hive/pull/2656#discussion_r717361733
##########
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##########
@@ -71,33 +71,34 @@ STAGE PLANS:
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 316 Data size: 30020 Basic stats:
COMPLETE Column stats: COMPLETE
- Limit
- Number of rows: 5
- Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE
Column stats: COMPLETE
- Reduce Output Operator
- null sort order:
- sort order:
- Statistics: Num rows: 5 Data size: 475 Basic stats:
COMPLETE Column stats: COMPLETE
- TopN Hash Memory Usage: 0.1
- value expressions: _col0 (type: string), _col1 (type:
double)
+ Reduce Output Operator
Review comment:
we lost the `Limit` operator from here - as a result we will be
shuffling all input rows.
I think this could become more costly for larger tables than the old plan.
I don't see TopN hash enabled on the reduce operator - which could possibly
save the day in this case; why did we loose that as well?
##########
File path: ql/src/test/results/clientpositive/llap/limit_pushdown.q.out
##########
@@ -1075,6 +1072,13 @@ STAGE PLANS:
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 316 Data size: 30020 Basic
stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: bigint)
+ Execution mode: vectorized, llap
+ LLAP IO: all inputs
+ Map 3
+ Map Operator Tree:
+ TableScan
+ alias: src
+ Statistics: Num rows: 500 Data size: 43500 Basic stats:
COMPLETE Column stats: COMPLETE
Top N Key Operator
Review comment:
in the old plan: did we have 2 TopN key operators in this plan which are
equal?
this is unrelated to this patch; but we may have an issue with its
comparision - and because of that SWO is not able to simplify them
##########
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##########
@@ -71,33 +71,34 @@ STAGE PLANS:
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 316 Data size: 30020 Basic stats:
COMPLETE Column stats: COMPLETE
- Limit
- Number of rows: 5
- Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE
Column stats: COMPLETE
- Reduce Output Operator
- null sort order:
- sort order:
- Statistics: Num rows: 5 Data size: 475 Basic stats:
COMPLETE Column stats: COMPLETE
- TopN Hash Memory Usage: 0.1
- value expressions: _col0 (type: string), _col1 (type:
double)
+ Reduce Output Operator
Review comment:
I missed that - most likely because the the row estimate was >100.
In that case this doesn't seem to be a problem; however we should fix the
stat estimate for the TNKO - could you open a ticket?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]