[ 
https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204357#comment-14204357
 ] 

Xuefu Zhang commented on HIVE-8542:
-----------------------------------

{quote}
One thing I'm not quite sure is that if we still need SHUFFLE_SORT. Ideally it 
should only be used for total order, but we can also achieve that with MR 
shuffle and setting #reducer to 1. I think hive forces #reducer to 1 for order 
by query, right?
{quote}
Yes, MR does total ordering by setting reducer to 1. Yes, in Spark, we can 
achieve the same thing with MR styled shuffer + 1 reducer. Historically, Spark 
didn't do a good job on Shuffle_SORT, but I heard they have made improvement 
lately. On the other hand, having 1 reducer isn't good either.

I think for now we keep both, but later we can do some benchmarking to see 
which performs better.

> Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-8542
>                 URL: https://issues.apache.org/jira/browse/HIVE-8542
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chao
>            Assignee: Rui Li
>         Attachments: HIVE-8542.1-spark.patch, HIVE-8542.2-spark.patch, 
> HIVE-8542.3-spark.patch, HIVE-8542.4-spark.patch
>
>
> Currently, in Spark branch, results for these two test files are very 
> different from MR's. We need to find out the cause for this, and identify 
> potential bug in our current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to