[
https://issues.apache.org/jira/browse/HIVE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076187#comment-14076187
]
Rui Li commented on HIVE-7527:
------------------------------
Hi [~xuefuz], I tried to run order by queries using spark's sortByKey
transformation but it seems the result is incorrect. I inserted the sortByKey
between HiveMapFunction and HiveReduceFunction (in substitute of partitionBy).
Wondering if this is the right way to do it...
I detect the order by by looking at the parent ReduceSink when a ReduceWork is
created and connected to a MapWork. It worked for my simple cases :)
> Support order by and sort by on Spark
> -------------------------------------
>
> Key: HIVE-7527
> URL: https://issues.apache.org/jira/browse/HIVE-7527
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
>
> Currently Hive depends completely on MapReduce's sorting as part of shuffling
> to achieve order by (global sort, one reducer) and sort by (local sort).
> Spark has a sort by transformation in different variations that can used to
> support Hive's order by and sort by. However, we still need to evaluate
> weather Spark's sortBy can achieve the same functionality inherited from
> MapReduce's shuffle sort.
> Currently Hive on Spark should be able to run simple sort by or order by, by
> changing the currently partitionBy to sortby. This is the way to verify
> theories. Complete solution will not be available until we have complete
> SparkPlanGenerator.
> There is also a question of how we determine that there is order by or sort
> by by just looking at the operator tree, from which Spark task is created.
> This is the responsibility of SparkPlanGenerator, but we need to have an idea.
--
This message was sent by Atlassian JIRA
(v6.2#6252)