[jira] [Updated] (HIVE-7527) Support order by and sort by on Spark

Xuefu Zhang (JIRA) Sun, 27 Jul 2014 15:41:58 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xuefu Zhang updated HIVE-7527:
------------------------------

    Description: 
Currently Hive depends completely on MapReduce's sorting as part of shuffling 
to achieve order by (global sort, one reducer) and sort by (local sort).
Spark has a sort by transformation in different variations that can used to 
support Hive's order by and sort by. However, we still need to evaluate weather 
Spark's sortBy can achieve the same functionality inherited from MapReduce's 
shuffle sort.

Currently Hive on Spark should be able to run simple sort by or order by, by 
changing the currently partitionBy to sortby. This is the way to verify 
theories. Complete solution will not be available until we have complete 
SparkPlanGenerator.

There is also a question of how we determine that there is order by or sort by 
by just looking at the operator tree, from which Spark task is created. This is 
the responsibility of SparkPlanGenerator, but we need to have an idea.

  was:
Currently Hive depends completely on MapReduce's sorting as part of shuffling 
to achieve order by (global sort, one reducer) and sort by (local sort).
Spark has a sort by transformation in different variations that can used to 
support Hive's order by and sort by. However, we still need to evaluate weather 
Spark's sortBy can achieve the same functionality inherited from MapReduce's 
shuffle sort.

Currently Hive on Spark should be able to run simple sort by or order by, by 
changing the currently partitionBy to sortby. This is the way to verify 
theories. Complete solution will not be available until we have complete 
SparkPlanGenerator.


> Support order by and sort by on Spark
> -------------------------------------
>
>                 Key: HIVE-7527
>                 URL: https://issues.apache.org/jira/browse/HIVE-7527
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> Currently Hive depends completely on MapReduce's sorting as part of shuffling 
> to achieve order by (global sort, one reducer) and sort by (local sort).
> Spark has a sort by transformation in different variations that can used to 
> support Hive's order by and sort by. However, we still need to evaluate 
> weather Spark's sortBy can achieve the same functionality inherited from 
> MapReduce's shuffle sort.
> Currently Hive on Spark should be able to run simple sort by or order by, by 
> changing the currently partitionBy to sortby. This is the way to verify 
> theories. Complete solution will not be available until we have complete 
> SparkPlanGenerator.
> There is also a question of how we determine that there is order by or sort 
> by by just looking at the operator tree, from which Spark task is created. 
> This is the responsibility of SparkPlanGenerator, but we need to have an idea.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7527) Support order by and sort by on Spark

Reply via email to