[ 
https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127330#comment-16127330
 ] 

Mathias M. Andersen commented on SPARK-19019:
---------------------------------------------

Just got this error post fix on spark 2.1:

Traceback (most recent call last):
    File "/opt/anaconda3/lib/python3.6/runpy.py", line 183, in 
_run_module_as_main
      mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
    File "/opt/anaconda3/lib/python3.6/runpy.py", line 109, in 
_get_module_details
      __import__(pkg_name)
    File "/usr/hdp/current/spark-client/python/pyspark/__init__.py", line 41, 
in <module>
      from pyspark.context import SparkContext
    File "/usr/hdp/current/spark-client/python/pyspark/context.py", line 33, in 
<module>
      from pyspark.java_gateway import launch_gateway
    File "/usr/hdp/current/spark-client/python/pyspark/java_gateway.py", line 
25, in <module>
      import platform
    File "/opt/anaconda3/lib/python3.6/platform.py", line 886, in <module>
      "system node release version machine processor")
    File "/usr/hdp/current/spark-client/python/pyspark/serializers.py", line 
381, in namedtuple
      cls = _old_namedtuple(*args, **kwargs)
  TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
'rename', and 'module'

> PySpark does not work with Python 3.6.0
> ---------------------------------------
>
>                 Key: SPARK-19019
>                 URL: https://issues.apache.org/jira/browse/SPARK-19019
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Critical
>             Fix For: 1.6.4, 2.0.3, 2.1.1, 2.2.0
>
>
> Currently, PySpark does not work with Python 3.6.0.
> Running {{./bin/pyspark}} simply throws the error as below:
> {code}
> Traceback (most recent call last):
>   File ".../spark/python/pyspark/shell.py", line 30, in <module>
>     import pyspark
>   File ".../spark/python/pyspark/__init__.py", line 46, in <module>
>     from pyspark.context import SparkContext
>   File ".../spark/python/pyspark/context.py", line 36, in <module>
>     from pyspark.java_gateway import launch_gateway
>   File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
>     from py4j.java_gateway import java_import, JavaGateway, GatewayClient
>   File "<frozen importlib._bootstrap>", line 961, in _find_and_load
>   File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
>   File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
>   File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
>   File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 
> 18, in <module>
>   File 
> "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py",
>  line 62, in <module>
>     import pkgutil
>   File 
> "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py",
>  line 22, in <module>
>     ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
>   File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
>     cls = _old_namedtuple(*args, **kwargs)
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
> 'rename', and 'module'
> {code}
> The problem is in 
> https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394
>  as the error says and the cause seems because the arguments of 
> {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 
> (See https://bugs.python.org/issue25628).
> We currently copy this function via {{types.FunctionType}} which does not set 
> the default values of keyword-only arguments (meaning 
> {{namedtuple.__kwdefaults__}}) and this seems causing internally missing 
> values in the function (non-bound arguments).
> This ends up as below:
> {code}
> import types
> import collections
> def _copy_func(f):
>     return types.FunctionType(f.__code__, f.__globals__, f.__name__,
>         f.__defaults__, f.__closure__)
> _old_namedtuple = _copy_func(collections.namedtuple)
> _old_namedtuple(, "b")
> _old_namedtuple("a")
> {code}
> If we call as below:
> {code}
> >>> _old_namedtuple("a", "b")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 
> 'rename', and 'module'
> {code}
> It throws an exception as above becuase {{__kwdefaults__}} for required 
> keyword arguments seem unset in the copied function. So, if we give explicit 
> value for these,
> {code}
> >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None)
> <class '__main__.a'>
> {code}
> It works fine.
> It seems now we should properly set these into the hijected one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to