Re: Pyspark - how to use UDFs with dataframe groupby

2016-02-10 Thread Davies Liu
short answer: PySpark does not support UDAF (user defined aggregate function) for now. On Tue, Feb 9, 2016 at 11:44 PM, Viktor ARDELEAN wrote: > Hello, > > I am using following transformations on RDD: > > rddAgg = df.map(lambda l: (Row(a = l.a, b= l.b, c = l.c), l))\ >

Pyspark - how to use UDFs with dataframe groupby

2016-02-09 Thread Viktor ARDELEAN
Hello, I am using following transformations on RDD: rddAgg = df.map(lambda l: (Row(a = l.a, b= l.b, c = l.c), l))\ .aggregateByKey([], lambda accumulatorList, value: accumulatorList + [value], lambda list1, list2: [list1] + [list2]) I want to use the dataframe groupBy + agg