[jira] [Commented] (HIVE-7527) Support order by and sort by on Spark

Rui Li (JIRA) Mon, 28 Jul 2014 05:37:31 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076187#comment-14076187
 ]


Rui Li commented on HIVE-7527:
------------------------------

Hi [~xuefuz], I tried to run order by queries using spark's sortByKey 
transformation but it seems the result is incorrect. I inserted the sortByKey 
between HiveMapFunction and HiveReduceFunction (in substitute of partitionBy). 
Wondering if this is the right way to do it...

I detect the order by by looking at the parent ReduceSink when a ReduceWork is 
created and connected to a MapWork. It worked for my simple cases :)

> Support order by and sort by on Spark
> -------------------------------------
>
>                 Key: HIVE-7527
>                 URL: https://issues.apache.org/jira/browse/HIVE-7527
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> Currently Hive depends completely on MapReduce's sorting as part of shuffling 
> to achieve order by (global sort, one reducer) and sort by (local sort).
> Spark has a sort by transformation in different variations that can used to 
> support Hive's order by and sort by. However, we still need to evaluate 
> weather Spark's sortBy can achieve the same functionality inherited from 
> MapReduce's shuffle sort.
> Currently Hive on Spark should be able to run simple sort by or order by, by 
> changing the currently partitionBy to sortby. This is the way to verify 
> theories. Complete solution will not be available until we have complete 
> SparkPlanGenerator.
> There is also a question of how we determine that there is order by or sort 
> by by just looking at the operator tree, from which Spark task is created. 
> This is the responsibility of SparkPlanGenerator, but we need to have an idea.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7527) Support order by and sort by on Spark

Reply via email to