[jira] [Created] (SPARK-23569) pandas_udf does not work with type-annotated python functions

Stu (Michael Stewart) (JIRA) Fri, 02 Mar 2018 11:05:39 -0800

Stu (Michael Stewart) created SPARK-23569:
---------------------------------------------


             Summary: pandas_udf does not work with type-annotated python 
functions
                 Key: SPARK-23569
                 URL: https://issues.apache.org/jira/browse/SPARK-23569
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.3.0
         Environment: pyspark 2.3.0

 

```

pyspark --version
Welcome to
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 2.3.0
 /_/

Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_141
Branch master
Compiled by user sameera on 2018-02-22T19:24:29Z
Revision a0d7949896e70f427e7f3942ff340c9484ff0aab
Url g...@github.com:sameeragarwal/spark.git
Type --help for more information.

```
            Reporter: Stu (Michael Stewart)


When invoked against a type annotated function pandas_udf raises:

`ValueError: Function has keyword-only parameters or annotations, use 
getfullargspec() API which can support them`

 

To reproduce:

```

from pyspark.sql import SparkSession

from pyspark.sql.functions import pandas_udf, PandasUDFType, col, lit

spark = SparkSession.builder.getOrCreate()

df = spark.range(12).withColumn('b', col('id') * 2)

def ok(a,b): return a*b

df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b')).show()  # 
no problems



import pandas as pd

def ok(a: pd.Series,b: pd.Series) -> pd.Series: return a*b

df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b'))

 

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-2e6ae67b15ee> in <module>()
----> 1 df.withColumn('ok', pandas_udf(f=ok, returnType='bigint')('id','b'))

/opt/miniconda/lib/python3.6/site-packages/pyspark/sql/functions.py in 
pandas_udf(f, returnType, functionType)
 2277 return functools.partial(_create_udf, returnType=return_type, 
evalType=eval_type)
 2278 else:
-> 2279 return _create_udf(f=f, returnType=return_type, evalType=eval_type)
 2280
 2281

/opt/miniconda/lib/python3.6/site-packages/pyspark/sql/udf.py in _create_udf(f, 
returnType, evalType)
 44
 45 require_minimum_pyarrow_version()
---> 46 argspec = inspect.getargspec(f)
 47
 48 if evalType == PythonEvalType.SQL_SCALAR_PANDAS_UDF and len(argspec.args) 
== 0 and \

/opt/miniconda/lib/python3.6/inspect.py in getargspec(func)
 1043 getfullargspec(func)
 1044 if kwonlyargs or ann:
-> 1045 raise ValueError("Function has keyword-only parameters or annotations"
 1046 ", use getfullargspec() API which can support them")
 1047 return ArgSpec(args, varargs, varkw, defaults)

ValueError: Function has keyword-only parameters or annotations, use 
getfullargspec() API which can support them

```

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23569) pandas_udf does not work with type-annotated python functions

Reply via email to