Sojan James created ZEPPELIN-1411:
-------------------------------------
Summary: UDF with pyspark not working
Key: ZEPPELIN-1411
URL: https://issues.apache.org/jira/browse/ZEPPELIN-1411
Project: Zeppelin
Issue Type: Bug
Components: python-interpreter
Affects Versions: 0.6.1
Reporter: Sojan James
The following UDF example doesn't work.
{code}
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())
df = sqlContext.createDataFrame([{'name': 'Alice', 'age': 1}])
df.withColumn("maturity", maturity_udf(df.age))
{code}
Stack trace
{code}
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-64075962331083004.py", line 266, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-64075962331083004.py", line 259, in <module>
exec(code)
File "<stdin>", line 3, in <module>
File
"/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py",
line 1789, in udf
return UserDefinedFunction(f, returnType)
File
"/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py",
line 1751, in __init__
self._judf = self._create_judf(name)
File
"/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py",
line 1758, in _create_judf
jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())
AttributeError: 'JavaMember' object has no attribute 'parseDataType'
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)