[
https://issues.apache.org/jira/browse/SPARK-23569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-23569.
----------------------------------
Resolution: Fixed
Fix Version/s: 2.4.0
2.3.1
Fixed in https://github.com/apache/spark/pull/20728
> pandas_udf does not work with type-annotated python functions
> -------------------------------------------------------------
>
> Key: SPARK-23569
> URL: https://issues.apache.org/jira/browse/SPARK-23569
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.3.0
> Environment: python 3.6 | pyspark 2.3.0 | Using Scala version 2.11.8,
> OpenJDK 64-Bit Server VM, 1.8.0_141 | Revision
> a0d7949896e70f427e7f3942ff340c9484ff0aab
> Reporter: Stu (Michael Stewart)
> Assignee: Stu (Michael Stewart)
> Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> When invoked against a type annotated function pandas_udf raises:
> `ValueError: Function has keyword-only parameters or annotations, use
> getfullargspec() API which can support them`
>
> the deprecated `getargsspec` call occurs in `pyspark/sql/udf.py`
> {code:java}
> def _create_udf(f, returnType, evalType):
> if evalType in (PythonEvalType.SQL_SCALAR_PANDAS_UDF,
> PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF):
> import inspect
> from pyspark.sql.utils import require_minimum_pyarrow_version
> require_minimum_pyarrow_version()
> argspec = inspect.getargspec(f)
> ...{code}
> To reproduce:
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import pandas_udf, PandasUDFType, col, lit
> spark = SparkSession.builder.getOrCreate()
> df = spark.range(12).withColumn('b', col('id') * 2)
> def ok(a,b): return a*b
> df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b')).show()
> # no problems
> import pandas as pd
> def ok(a: pd.Series,b: pd.Series) -> pd.Series: return a*b
> df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b'))
>
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
> <ipython-input-17-2e6ae67b15ee> in <module>()
> ----> 1 df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b'))
> /opt/miniconda/lib/python3.6/site-packages/pyspark/sql/functions.py in
> pandas_udf(f, returnType, functionType)
> 2277 return functools.partial(_create_udf, returnType=return_type,
> evalType=eval_type)
> 2278 else:
> -> 2279 return _create_udf(f=f, returnType=return_type, evalType=eval_type)
> 2280
> 2281
> /opt/miniconda/lib/python3.6/site-packages/pyspark/sql/udf.py in
> _create_udf(f, returnType, evalType)
> 44
> 45 require_minimum_pyarrow_version()
> ---> 46 argspec = inspect.getargspec(f)
> 47
> 48 if evalType == PythonEvalType.SQL_SCALAR_PANDAS_UDF and len(argspec.args)
> == 0 and \
> /opt/miniconda/lib/python3.6/inspect.py in getargspec(func)
> 1043 getfullargspec(func)
> 1044 if kwonlyargs or ann:
> -> 1045 raise ValueError("Function has keyword-only parameters or annotations"
> 1046 ", use getfullargspec() API which can support them")
> 1047 return ArgSpec(args, varargs, varkw, defaults)
> ValueError: Function has keyword-only parameters or annotations, use
> getfullargspec() API which can support them
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]