[ https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204357#comment-14204357 ]
Xuefu Zhang commented on HIVE-8542: ----------------------------------- {quote} One thing I'm not quite sure is that if we still need SHUFFLE_SORT. Ideally it should only be used for total order, but we can also achieve that with MR shuffle and setting #reducer to 1. I think hive forces #reducer to 1 for order by query, right? {quote} Yes, MR does total ordering by setting reducer to 1. Yes, in Spark, we can achieve the same thing with MR styled shuffer + 1 reducer. Historically, Spark didn't do a good job on Shuffle_SORT, but I heard they have made improvement lately. On the other hand, having 1 reducer isn't good either. I think for now we keep both, but later we can do some benchmarking to see which performs better. > Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch] > ---------------------------------------------------------------------------- > > Key: HIVE-8542 > URL: https://issues.apache.org/jira/browse/HIVE-8542 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Chao > Assignee: Rui Li > Attachments: HIVE-8542.1-spark.patch, HIVE-8542.2-spark.patch, > HIVE-8542.3-spark.patch, HIVE-8542.4-spark.patch > > > Currently, in Spark branch, results for these two test files are very > different from MR's. We need to find out the cause for this, and identify > potential bug in our current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)