[GitHub] [hive] kasakrisz commented on a change in pull request #2656: HIVE-24579: Incorrect Result For Groupby With Limit

GitBox Tue, 28 Sep 2021 02:40:14 -0700


kasakrisz commented on a change in pull request #2656:
URL: https://github.com/apache/hive/pull/2656#discussion_r717404007




##########
File path: ql/src/test/results/clientpositive/llap/groupby1_limit.q.out
##########
@@ -71,33 +71,34 @@ STAGE PLANS:
                 mode: mergepartial
                 outputColumnNames: _col0, _col1
                 Statistics: Num rows: 316 Data size: 30020 Basic stats: 
COMPLETE Column stats: COMPLETE
-                Limit
-                  Number of rows: 5
-                  Statistics: Num rows: 5 Data size: 475 Basic stats: COMPLETE 
Column stats: COMPLETE
-                  Reduce Output Operator
-                    null sort order: 
-                    sort order: 
-                    Statistics: Num rows: 5 Data size: 475 Basic stats: 
COMPLETE Column stats: COMPLETE
-                    TopN Hash Memory Usage: 0.1
-                    value expressions: _col0 (type: string), _col1 (type: 
double)
+                Reduce Output Operator

Review comment:
       But we still have TopNKey operator in the Mapper (both old and new plan) 
it filters out the majority of the rows.
   
   This query has the same issue like the example in the jira: it has gby with 
limit + aggregate function in the project:
   ```
   SELECT src.key, sum(substr(src.value,5)) GROUP BY src.key LIMIT 5
   ``` 
   If no ordering is specified we may end up with incorrect aggregations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] kasakrisz commented on a change in pull request #2656: HIVE-24579: Incorrect Result For Groupby With Limit

Reply via email to