[ 
https://issues.apache.org/jira/browse/HIVE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7527:
-------------------------

    Attachment: HIVE-7527-spark.patch

> Support order by and sort by on Spark
> -------------------------------------
>
>                 Key: HIVE-7527
>                 URL: https://issues.apache.org/jira/browse/HIVE-7527
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>         Attachments: HIVE-7527-spark.patch
>
>
> Currently Hive depends completely on MapReduce's sorting as part of shuffling 
> to achieve order by (global sort, one reducer) and sort by (local sort).
> Spark has a sort by transformation in different variations that can used to 
> support Hive's order by and sort by. However, we still need to evaluate 
> weather Spark's sortBy can achieve the same functionality inherited from 
> MapReduce's shuffle sort.
> Currently Hive on Spark should be able to run simple sort by or order by, by 
> changing the currently partitionBy to sortby. This is the way to verify 
> theories. Complete solution will not be available until we have complete 
> SparkPlanGenerator.
> There is also a question of how we determine that there is order by or sort 
> by by just looking at the operator tree, from which Spark task is created. 
> This is the responsibility of SparkPlanGenerator, but we need to have an idea.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to