pyspark_partition_key_change

matei Sat, 05 Oct 2013 13:26:14 -0700

Merge pull request #33 from AndreSchumacher/pyspark_partition_key_change

Fixing SPARK-602: PythonPartitioner


Currently PythonPartitioner determines partition ID by hashing a
byte-array representation of PySpark's key. This PR lets
PythonPartitioner use the actual partition ID, which is required e.g.
for sorting via PySpark.


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/08641932
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/08641932
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/08641932

Branch: refs/heads/master
Commit: 08641932bd17910cb5a839cdc7daeebfe4ae7ada
Parents: 232765f c84946f
Author: Matei Zaharia <ma...@eecs.berkeley.edu>
Authored: Sat Oct 5 13:25:18 2013 -0700
Committer: Matei Zaharia <ma...@eecs.berkeley.edu>
Committed: Sat Oct 5 13:25:18 2013 -0700

----------------------------------------------------------------------
 .../apache/spark/api/python/PythonPartitioner.scala    | 10 +++++++---
 .../scala/org/apache/spark/api/python/PythonRDD.scala  |  6 +++---
 core/src/main/scala/org/apache/spark/util/Utils.scala  | 13 +++++++++++++
 .../test/scala/org/apache/spark/util/UtilsSuite.scala  | 11 +++++++++++
 python/pyspark/rdd.py                                  | 10 ++++++----
 python/pyspark/serializers.py                          |  4 ++++
 6 files changed, 44 insertions(+), 10 deletions(-)
----------------------------------------------------------------------

[2/2] git commit: Merge pull request #33 from AndreSchumacher/pyspark_partition_key_change

Reply via email to