Merge pull request #33 from AndreSchumacher/pyspark_partition_key_change Fixing SPARK-602: PythonPartitioner
Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark. Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/08641932 Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/08641932 Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/08641932 Branch: refs/heads/scala-2.10 Commit: 08641932bd17910cb5a839cdc7daeebfe4ae7ada Parents: 232765f c84946f Author: Matei Zaharia <ma...@eecs.berkeley.edu> Authored: Sat Oct 5 13:25:18 2013 -0700 Committer: Matei Zaharia <ma...@eecs.berkeley.edu> Committed: Sat Oct 5 13:25:18 2013 -0700 ---------------------------------------------------------------------- .../apache/spark/api/python/PythonPartitioner.scala | 10 +++++++--- .../scala/org/apache/spark/api/python/PythonRDD.scala | 6 +++--- core/src/main/scala/org/apache/spark/util/Utils.scala | 13 +++++++++++++ .../test/scala/org/apache/spark/util/UtilsSuite.scala | 11 +++++++++++ python/pyspark/rdd.py | 10 ++++++---- python/pyspark/serializers.py | 4 ++++ 6 files changed, 44 insertions(+), 10 deletions(-) ----------------------------------------------------------------------