Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread Viktor ARDELEAN
1, in AttributeError: 'Column' object has no attribute 'replace' So in fact I need somehow to get the value of the column df.str in order to call replace on it. Any ideas how to do this? -- Viktor ARDELEAN *P* Don't print this email, unless it's really necessary. Take care of the environment.

Re: Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread Viktor ARDELEAN
t 13:04, <ndj...@gmail.com> wrote: > Hi Viktor, > > Try to create a UDF. It's quite simple! > > Ardo. > > > On 10 Feb 2016, at 10:34, Viktor ARDELEAN <viktor0...@gmail.com> wrote: > > Hello, > > I want to add a new String column to the d

Pyspark - how to use UDFs with dataframe groupby

2016-02-09 Thread Viktor ARDELEAN
Hello, I am using following transformations on RDD: rddAgg = df.map(lambda l: (Row(a = l.a, b= l.b, c = l.c), l))\ .aggregateByKey([], lambda accumulatorList, value: accumulatorList + [value], lambda list1, list2: [list1] + [list2]) I want to use the dataframe groupBy + agg