[ https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495152#comment-14495152 ]
watson xi commented on SPARK-4897: ---------------------------------- Hi guys, whats the status of this project? I know a few people (including myself) who are ready to wave goodbye to Python 2 (its been 6.5 years now!)... from an outside perspective looking it, Python 3 compatibility appears close! > Python 3 support > ---------------- > > Key: SPARK-4897 > URL: https://issues.apache.org/jira/browse/SPARK-4897 > Project: Spark > Issue Type: Improvement > Components: PySpark > Reporter: Josh Rosen > Assignee: Davies Liu > Priority: Minor > > It would be nice to have Python 3 support in PySpark, provided that we can do > it in a way that maintains backwards-compatibility with Python 2.6. > I started looking into porting this; my WIP work can be found at > https://github.com/JoshRosen/spark/compare/python3 > I was able to use the > [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] > tool to handle the basic conversion of things like {{print}} statements, etc. > and had to manually fix up a few imports for packages that moved / were > renamed, but the major blocker that I hit was {{cloudpickle}}: > {code} > [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark > Python 3.4.2 (default, Oct 19 2014, 17:52:17) > [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > Traceback (most recent call last): > File "/Users/joshrosen/Documents/Spark/python/pyspark/shell.py", line 28, > in <module> > import pyspark > File "/Users/joshrosen/Documents/spark/python/pyspark/__init__.py", line > 41, in <module> > from pyspark.context import SparkContext > File "/Users/joshrosen/Documents/spark/python/pyspark/context.py", line 26, > in <module> > from pyspark import accumulators > File "/Users/joshrosen/Documents/spark/python/pyspark/accumulators.py", > line 97, in <module> > from pyspark.cloudpickle import CloudPickler > File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line > 120, in <module> > class CloudPickler(pickle.Pickler): > File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line > 122, in CloudPickler > dispatch = pickle.Pickler.dispatch.copy() > AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch' > {code} > This code looks like it will be hard difficult to port to Python 3, so this > might be a good reason to switch to > [Dill|https://github.com/uqfoundation/dill] for Python serialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org