Egor Pahomov created ZEPPELIN-1442:
--------------------------------------
Summary: Pyspark in Zeppelin does not support UDF
Key: ZEPPELIN-1442
URL: https://issues.apache.org/jira/browse/ZEPPELIN-1442
Project: Zeppelin
Issue Type: Bug
Components: pySpark, python-interpreter
Affects Versions: 0.6.1
Reporter: Egor Pahomov
It worked in 0.5.6 with spark 1.6.2
On 2.0 - I've checked - when I run just pySpark everything working fine.
I've build zeppelin with
{code}
mvn clean package -Pspark-2.0 -Phadoop-2.6 -Pyarn -Ppyspark -DskipTests
{code}
{code}
%pyspark
from pyspark.sql import SQLContext, Row
from pyspark.sql.types import StringType, IntegerType, StructType, StructField,
MapType, FloatType, ArrayType
from pyspark.sql.functions import udf
sqlContext.registerFunction("stringLengthString", lambda x: len(x))
{code}
Returns:
{code}
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-3201872850141735060.py", line 266, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-3201872850141735060.py", line 264, in <module>
exec(code)
File "<stdin>", line 7, in <module>
File "/home/egor/spark/python/pyspark/sql/context.py", line 203, in
registerFunction
self.sparkSession.catalog.registerFunction(name, f, returnType)
AttributeError: 'JavaMember' object has no attribute 'registerFunction'
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)