Re: Nested UDFs

2016-11-17 Thread Perttu Ranta-aho
(test_data.name, ‘a’, ‘X’) > > > > You would need a Udf if you would wanted to do something on the string > value of a single row (e.g. return data + “bla”) > > > > Assaf. > > > > *From:* Perttu Ranta-aho [mailto:ranta...@iki.fi] > *Sent:* Thursday, November 17, 2

Nested UDFs

2016-11-16 Thread Perttu Ranta-aho
Hi, Shouldn't this work? from pyspark.sql.functions import regexp_replace, udf def my_f(data): return regexp_replace(data, 'a', 'X') my_udf = udf(my_f) test_data = sqlContext.createDataFrame([('a',), ('b',), ('c',)], ('name',)) test_data.select(my_udf(test_data.name)).show() But instead

Re: UDF with column value comparison fails with PySpark

2016-11-10 Thread Perttu Ranta-aho
So it was something obvious, thanks! -Perttu to 10. marraskuuta 2016 klo 21.19 Davies Liu <dav...@databricks.com> kirjoitti: > On Thu, Nov 10, 2016 at 11:14 AM, Perttu Ranta-aho <ranta...@iki.fi> > wrote: > > Hello, > > > > I want to create an UDF which

UDF with column value comparison fails with PySpark

2016-11-10 Thread Perttu Ranta-aho
Hello, I want to create an UDF which modifies one column value depending on value of some other column. But Python version of the code fails always in column value comparison. Below are simple examples, scala version works as expected but Python version throws an execption. Am I missing something

Re: PySpark Mesos random crashes

2014-05-26 Thread Perttu Ranta-aho
has to be found in whatever is causing the DAGScheduler to need to shutdown in the first place. On Sun, May 25, 2014 at 12:10 PM, Perttu Ranta-aho perttu.ranta...@gmail.com wrote: Hi, We have a small Mesos (0.18.1) cluster with 4 nodes. Upgraded to Spark 1.0.0-rc9, to overcome some

PySpark Mesos random crashes

2014-05-25 Thread Perttu Ranta-aho
Hi, We have a small Mesos (0.18.1) cluster with 4 nodes. Upgraded to Spark 1.0.0-rc9, to overcome some PySpark bugs. But now we are experiencing random crashes with almost every job. Local jobs run fine, but same code with same data set in Mesos cluster leads to errors like: 14/05/22 15:03:34