Stu (Michael Stewart) created SPARK-23569: ---------------------------------------------
Summary: pandas_udf does not work with type-annotated python functions Key: SPARK-23569 URL: https://issues.apache.org/jira/browse/SPARK-23569 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.0 Environment: pyspark 2.3.0 ``` pyspark --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0 /_/ Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_141 Branch master Compiled by user sameera on 2018-02-22T19:24:29Z Revision a0d7949896e70f427e7f3942ff340c9484ff0aab Url g...@github.com:sameeragarwal/spark.git Type --help for more information. ``` Reporter: Stu (Michael Stewart) When invoked against a type annotated function pandas_udf raises: `ValueError: Function has keyword-only parameters or annotations, use getfullargspec() API which can support them` To reproduce: ``` from pyspark.sql import SparkSession from pyspark.sql.functions import pandas_udf, PandasUDFType, col, lit spark = SparkSession.builder.getOrCreate() df = spark.range(12).withColumn('b', col('id') * 2) def ok(a,b): return a*b df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b')).show() # no problems import pandas as pd def ok(a: pd.Series,b: pd.Series) -> pd.Series: return a*b df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b')) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-17-2e6ae67b15ee> in <module>() ----> 1 df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b')) /opt/miniconda/lib/python3.6/site-packages/pyspark/sql/functions.py in pandas_udf(f, returnType, functionType) 2277 return functools.partial(_create_udf, returnType=return_type, evalType=eval_type) 2278 else: -> 2279 return _create_udf(f=f, returnType=return_type, evalType=eval_type) 2280 2281 /opt/miniconda/lib/python3.6/site-packages/pyspark/sql/udf.py in _create_udf(f, returnType, evalType) 44 45 require_minimum_pyarrow_version() ---> 46 argspec = inspect.getargspec(f) 47 48 if evalType == PythonEvalType.SQL_SCALAR_PANDAS_UDF and len(argspec.args) == 0 and \ /opt/miniconda/lib/python3.6/inspect.py in getargspec(func) 1043 getfullargspec(func) 1044 if kwonlyargs or ann: -> 1045 raise ValueError("Function has keyword-only parameters or annotations" 1046 ", use getfullargspec() API which can support them") 1047 return ArgSpec(args, varargs, varkw, defaults) ValueError: Function has keyword-only parameters or annotations, use getfullargspec() API which can support them ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org