RE: Nested UDFs

2016-11-17 Thread Mendelson, Assaf
)) The second option is more correct and should provide better performance. From: Perttu Ranta-aho [mailto:ranta...@iki.fi] Sent: Thursday, November 17, 2016 1:50 PM To: user@spark.apache.org Subject: Re: Nested UDFs Hi, My example was little bogus, my real use case is to do multiple regexp

Re: Nested UDFs

2016-11-17 Thread Perttu Ranta-aho
(test_data.name, ‘a’, ‘X’) > > > > You would need a Udf if you would wanted to do something on the string > value of a single row (e.g. return data + “bla”) > > > > Assaf. > > > > *From:* Perttu Ranta-aho [mailto:ranta...@iki.fi] > *Sent:* Thursday, November 17, 2

RE: Nested UDFs

2016-11-16 Thread Mendelson, Assaf
: Perttu Ranta-aho [mailto:ranta...@iki.fi] Sent: Thursday, November 17, 2016 9:15 AM To: user@spark.apache.org Subject: Nested UDFs Hi, Shouldn't this work? from pyspark.sql.functions import regexp_replace, udf def my_f(data): return regexp_replace(data, 'a', 'X') my_udf = udf(my_f) test_data

Nested UDFs

2016-11-16 Thread Perttu Ranta-aho
Hi, Shouldn't this work? from pyspark.sql.functions import regexp_replace, udf def my_f(data): return regexp_replace(data, 'a', 'X') my_udf = udf(my_f) test_data = sqlContext.createDataFrame([('a',), ('b',), ('c',)], ('name',)) test_data.select(my_udf(test_data.name)).show() But instead