Hi I run ./python/ru-tests to test following modules of spark-1.5.1: [pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming]
against to following pypy versions: pypy-2.2.1 pypy-2.3 pypy-2.3.1 pypy-2.4.0 pypy-2.5.0 pypy-2.5.1 pypy-2.6.0 pypy-2.6.1 pypy-4.0.0 except pypy-2.2.1, all others pass the test. the error message of pypy-2.2.1 is: Traceback (most recent call last): File "app_main.py", line 72, in run_toplevel File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py", line 151, in _run_module_as_main mod_name, loader, code, fname = _get_module_details(mod_name) File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py", line 101, in _get_module_details loader = get_loader(mod_name) File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line 465, in get_loader return find_loader(fullname) File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line 475, in find_loader for importer in iter_importers(fullname): File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line 431, in iter_importers __import__(pkg) File "pyspark/__init__.py", line 41, in <module> from pyspark.context import SparkContext File "pyspark/context.py", line 26, in <module> from pyspark import accumulators File "pyspark/accumulators.py", line 98, in <module> from pyspark.serializers import read_int, PickleSerializer File "pyspark/serializers.py", line 400, in <module> _hijack_namedtuple() File "pyspark/serializers.py", line 378, in _hijack_namedtuple _old_namedtuple = _copy_func(collections.namedtuple) File "pyspark/serializers.py", line 376, in _copy_func f.__defaults__, f.__closure__) AttributeError: 'function' object has no attribute '__closure__' p.s. would you want to test different pypy versions on your Jenkins? maybe I could help On Fri, Nov 6, 2015 at 2:23 AM, Josh Rosen <joshro...@databricks.com> wrote: > You could try running PySpark's own unit tests. Try ./python/run-tests > --help for instructions. > > On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <sumti...@gmail.com> wrote: > >> I've test on following pypy version against to spark-1.5.1 >> >> pypy-2.2.1 >> pypy-2.3 >> pypy-2.3.1 >> pypy-2.4.0 >> pypy-2.5.0 >> pypy-2.5.1 >> pypy-2.6.0 >> pypy-2.6.1 >> >> I run >> >> $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy >> /path/to/spark-1.5.1/bin/pyspark >> >> and only pypy-2.2.1 failed. >> >> Any suggestion to run advanced test? >> >> On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <sumti...@gmail.com> >> wrote: >> >>> Thanks for your quickly reply. >>> >>> I will test several pypy versions and report the result later. >>> >>> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <rosenvi...@gmail.com> wrote: >>> >>>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's >>>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy >>>> version to see if that works? >>>> >>>> I just checked and it looks like our Jenkins tests are running against >>>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual >>>> minimum supported PyPy version is. Would you be interested in helping to >>>> investigate so that we can update the documentation or produce a fix to >>>> restore compatibility with earlier PyPy builds? >>>> >>>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <sumti...@gmail.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I am trying to run pyspark with pypy, and it is work when using >>>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1 >>>>> >>>>> my pypy version: >>>>> >>>>> $ /usr/bin/pypy --version >>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) >>>>> [PyPy 2.2.1 with GCC 4.8.4] >>>>> >>>>> works with spark-1.3.1 >>>>> >>>>> $ PYSPARK_PYTHON=/usr/bin/pypy >>>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark >>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) >>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2 >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a >>>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface >>>>> eth0) >>>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind >>>>> to another address >>>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop >>>>> library for your platform... using builtin-java classes where applicable >>>>> Welcome to >>>>> ____ __ >>>>> / __/__ ___ _____/ /__ >>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.3.1 >>>>> /_/ >>>>> >>>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015) >>>>> SparkContext available as sc, HiveContext available as sqlContext. >>>>> And now for something completely different: ``Armin: "Prolog is a >>>>> mess.", CF: >>>>> "No, it's very cool!", Armin: "Isn't this what I said?"'' >>>>> >>> >>>>> >>>>> error message for 1.5.1 >>>>> >>>>> $ PYSPARK_PYTHON=/usr/bin/pypy >>>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark >>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40) >>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2 >>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>> Traceback (most recent call last): >>>>> File "app_main.py", line 72, in run_toplevel >>>>> File "app_main.py", line 614, in run_it >>>>> File >>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py", >>>>> line 30, in <module> >>>>> import pyspark >>>>> File >>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py", >>>>> line 41, in <module> >>>>> from pyspark.context import SparkContext >>>>> File >>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py", >>>>> line 26, in <module> >>>>> from pyspark import accumulators >>>>> File >>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py", >>>>> line 98, in <module> >>>>> from pyspark.serializers import read_int, PickleSerializer >>>>> File >>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", >>>>> line 400, in <module> >>>>> _hijack_namedtuple() >>>>> File >>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", >>>>> line 378, in _hijack_namedtuple >>>>> _old_namedtuple = _copy_func(collections.namedtuple) >>>>> File >>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py", >>>>> line 376, in _copy_func >>>>> f.__defaults__, f.__closure__) >>>>> AttributeError: 'function' object has no attribute '__closure__' >>>>> And now for something completely different: ``the traces don't lie'' >>>>> >>>>> is this a known issue? any suggestion to resolve it? or how can I help >>>>> to fix this problem? >>>>> >>>>> Thanks. >>>>> >>>> >>>> >>> >>> >>> -- >>> -- 張雅軒 >>> >> >> >> >> -- >> -- 張雅軒 >> > -- -- 張雅軒