Hi all, I am trying to run pyspark with pypy, and it is work when using spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
my pypy version: $ /usr/bin/pypy --version Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) [PyPy 2.2.1 with GCC 4.8.4] works with spark-1.3.1 $ PYSPARK_PYTHON=/usr/bin/pypy ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) [PyPy 2.2.1 with GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface eth0) 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.3.1 /_/ Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015) SparkContext available as sc, HiveContext available as sqlContext. And now for something completely different: ``Armin: "Prolog is a mess.", CF: "No, it's very cool!", Armin: "Isn't this what I said?"'' >>> error message for 1.5.1 $ PYSPARK_PYTHON=/usr/bin/pypy ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) [PyPy 2.2.1 with GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. Traceback (most recent call last): File "app_main.py", line 72, in run_toplevel File "app_main.py", line 614, in run_it File "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py", line 30, in <module> import pyspark File "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py", line 41, in <module> from pyspark.context import SparkContext File "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py", line 26, in <module> from pyspark import accumulators File "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py", line 98, in <module> from pyspark.serializers import read_int, PickleSerializer File "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", line 400, in <module> _hijack_namedtuple() File "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", line 378, in _hijack_namedtuple _old_namedtuple = _copy_func(collections.namedtuple) File "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", line 376, in _copy_func f.__defaults__, f.__closure__) AttributeError: 'function' object has no attribute '__closure__' And now for something completely different: ``the traces don't lie'' is this a known issue? any suggestion to resolve it? or how can I help to fix this problem? Thanks.