Merge pull request #523 from JoshRosen/SPARK-1043 Switch from MUTF8 to UTF8 in PySpark serializers.
This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB. This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts. Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/f8c742ce Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/f8c742ce Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/f8c742ce Branch: refs/heads/master Commit: f8c742ce274fbae2a9e616d4c97469b6a22069bb Parents: 84670f2 1381fc7 Author: Josh Rosen <joshro...@apache.org> Authored: Tue Jan 28 21:30:20 2014 -0800 Committer: Josh Rosen <joshro...@apache.org> Committed: Tue Jan 28 21:30:20 2014 -0800 ---------------------------------------------------------------------- .../org/apache/spark/api/python/PythonRDD.scala | 18 +++++++--- .../spark/api/python/PythonRDDSuite.scala | 35 ++++++++++++++++++++ python/pyspark/context.py | 4 +-- python/pyspark/serializers.py | 6 ++-- python/pyspark/worker.py | 8 ++--- 5 files changed, 57 insertions(+), 14 deletions(-) ----------------------------------------------------------------------