Re: pyspark with pypy not work for spark-1.5.1

Chang Ya-Hsuan Fri, 06 Nov 2015 02:28:42 -0800

Hi I run ./python/ru-tests to test following modules of spark-1.5.1:

[pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql',
'pyspark-streaming]


against to following pypy versions:

pypy-2.2.1  pypy-2.3  pypy-2.3.1  pypy-2.4.0  pypy-2.5.0  pypy-2.5.1
 pypy-2.6.0  pypy-2.6.1  pypy-4.0.0

except pypy-2.2.1, all others pass the test.

the error message of pypy-2.2.1 is:

Traceback (most recent call last):
  File "app_main.py", line 72, in run_toplevel
  File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py",
line 151, in _run_module_as_main
    mod_name, loader, code, fname = _get_module_details(mod_name)
  File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py",
line 101, in _get_module_details
    loader = get_loader(mod_name)
  File
"/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line
465, in get_loader
    return find_loader(fullname)
  File
"/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line
475, in find_loader
    for importer in iter_importers(fullname):
  File
"/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line
431, in iter_importers
    __import__(pkg)
  File "pyspark/__init__.py", line 41, in <module>
    from pyspark.context import SparkContext
  File "pyspark/context.py", line 26, in <module>
    from pyspark import accumulators
  File "pyspark/accumulators.py", line 98, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "pyspark/serializers.py", line 400, in <module>
    _hijack_namedtuple()
  File "pyspark/serializers.py", line 378, in _hijack_namedtuple
    _old_namedtuple = _copy_func(collections.namedtuple)
  File "pyspark/serializers.py", line 376, in _copy_func
    f.__defaults__, f.__closure__)
AttributeError: 'function' object has no attribute '__closure__'

p.s. would you want to test different pypy versions on your Jenkins? maybe
I could help

On Fri, Nov 6, 2015 at 2:23 AM, Josh Rosen <joshro...@databricks.com> wrote:

> You could try running PySpark's own unit tests. Try ./python/run-tests
> --help for instructions.
>
> On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <sumti...@gmail.com> wrote:
>
>> I've test on following pypy version against to spark-1.5.1
>>
>>   pypy-2.2.1
>>   pypy-2.3
>>   pypy-2.3.1
>>   pypy-2.4.0
>>   pypy-2.5.0
>>   pypy-2.5.1
>>   pypy-2.6.0
>>   pypy-2.6.1
>>
>> I run
>>
>>     $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy
>> /path/to/spark-1.5.1/bin/pyspark
>>
>> and only pypy-2.2.1 failed.
>>
>> Any suggestion to run advanced test?
>>
>> On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <sumti...@gmail.com>
>> wrote:
>>
>>> Thanks for your quickly reply.
>>>
>>> I will test several pypy versions and report the result later.
>>>
>>> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <rosenvi...@gmail.com> wrote:
>>>
>>>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
>>>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
>>>> version to see if that works?
>>>>
>>>> I just checked and it looks like our Jenkins tests are running against
>>>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual
>>>> minimum supported PyPy version is. Would you be interested in helping to
>>>> investigate so that we can update the documentation or produce a fix to
>>>> restore compatibility with earlier PyPy builds?
>>>>
>>>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <sumti...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to run pyspark with pypy, and it is work when using
>>>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>>>>>
>>>>> my pypy version:
>>>>>
>>>>> $ /usr/bin/pypy --version
>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>> [PyPy 2.2.1 with GCC 4.8.4]
>>>>>
>>>>> works with spark-1.3.1
>>>>>
>>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a
>>>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface
>>>>> eth0)
>>>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
>>>>> to another address
>>>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>> library for your platform... using builtin-java classes where applicable
>>>>> Welcome to
>>>>>       ____              __
>>>>>      / __/__  ___ _____/ /__
>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>>>>>       /_/
>>>>>
>>>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
>>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>>> And now for something completely different: ``Armin: "Prolog is a
>>>>> mess.", CF:
>>>>> "No, it's very cool!", Armin: "Isn't this what I said?"''
>>>>> >>>
>>>>>
>>>>> error message for 1.5.1
>>>>>
>>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> Traceback (most recent call last):
>>>>>   File "app_main.py", line 72, in run_toplevel
>>>>>   File "app_main.py", line 614, in run_it
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
>>>>> line 30, in <module>
>>>>>     import pyspark
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
>>>>> line 41, in <module>
>>>>>     from pyspark.context import SparkContext
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
>>>>> line 26, in <module>
>>>>>     from pyspark import accumulators
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
>>>>> line 98, in <module>
>>>>>     from pyspark.serializers import read_int, PickleSerializer
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>> line 400, in <module>
>>>>>     _hijack_namedtuple()
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>> line 378, in _hijack_namedtuple
>>>>>     _old_namedtuple = _copy_func(collections.namedtuple)
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>> line 376, in _copy_func
>>>>>     f.__defaults__, f.__closure__)
>>>>> AttributeError: 'function' object has no attribute '__closure__'
>>>>> And now for something completely different: ``the traces don't lie''
>>>>>
>>>>> is this a known issue? any suggestion to resolve it? or how can I help
>>>>> to fix this problem?
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> -- 張雅軒
>>>
>>
>>
>>
>> --
>> -- 張雅軒
>>
>


-- 
-- 張雅軒

Re: pyspark with pypy not work for spark-1.5.1

Reply via email to