Hi, Shouldn't this work?
from pyspark.sql.functions import regexp_replace, udf def my_f(data): return regexp_replace(data, 'a', 'X') my_udf = udf(my_f) test_data = sqlContext.createDataFrame([('a',), ('b',), ('c',)], ('name',)) test_data.select(my_udf(test_data.name)).show() But instead of 'a' being replaced with 'X' I get exception: File ".../spark-2.0.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/functions.py", line 1471, in regexp_replace jc = sc._jvm.functions.regexp_replace(_to_java_column(str), pattern, replacement) AttributeError: 'NoneType' object has no attribute '_jvm' ??? -Perttu