[ 
https://issues.apache.org/jira/browse/SPARK-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495443#comment-14495443
 ] 

Davies Liu commented on SPARK-4897:
-----------------------------------

That PR is pretty close to merge, we are targeting this for 1.4 release. It 
will be helpful if you guy can test this in your environments. Currently, it's 
only covered by unit tests.

> Python 3 support
> ----------------
>
>                 Key: SPARK-4897
>                 URL: https://issues.apache.org/jira/browse/SPARK-4897
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: Josh Rosen
>            Assignee: Davies Liu
>            Priority: Minor
>
> It would be nice to have Python 3 support in PySpark, provided that we can do 
> it in a way that maintains backwards-compatibility with Python 2.6.
> I started looking into porting this; my WIP work can be found at 
> https://github.com/JoshRosen/spark/compare/python3
> I was able to use the 
> [futurize|http://python-future.org/futurize.html#forwards-conversion-stage1] 
> tool to handle the basic conversion of things like {{print}} statements, etc. 
> and had to manually fix up a few imports for packages that moved / were 
> renamed, but the major blocker that I hit was {{cloudpickle}}:
> {code}
> [joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
> Python 3.4.2 (default, Oct 19 2014, 17:52:17)
> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> Traceback (most recent call last):
>   File "/Users/joshrosen/Documents/Spark/python/pyspark/shell.py", line 28, 
> in <module>
>     import pyspark
>   File "/Users/joshrosen/Documents/spark/python/pyspark/__init__.py", line 
> 41, in <module>
>     from pyspark.context import SparkContext
>   File "/Users/joshrosen/Documents/spark/python/pyspark/context.py", line 26, 
> in <module>
>     from pyspark import accumulators
>   File "/Users/joshrosen/Documents/spark/python/pyspark/accumulators.py", 
> line 97, in <module>
>     from pyspark.cloudpickle import CloudPickler
>   File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line 
> 120, in <module>
>     class CloudPickler(pickle.Pickler):
>   File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line 
> 122, in CloudPickler
>     dispatch = pickle.Pickler.dispatch.copy()
> AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'
> {code}
> This code looks like it will be hard difficult to port to Python 3, so this 
> might be a good reason to switch to 
> [Dill|https://github.com/uqfoundation/dill] for Python serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to