short answer: PySpark does not support UDAF (user defined aggregate
function) for now.
On Tue, Feb 9, 2016 at 11:44 PM, Viktor ARDELEAN
wrote:
> Hello,
>
> I am using following transformations on RDD:
>
> rddAgg = df.map(lambda l: (Row(a = l.a, b= l.b, c = l.c), l))\
>
Hello,
I am using following transformations on RDD:
rddAgg = df.map(lambda l: (Row(a = l.a, b= l.b, c = l.c), l))\
.aggregateByKey([], lambda accumulatorList, value:
accumulatorList + [value], lambda list1, list2: [list1] + [list2])
I want to use the dataframe groupBy + agg