Re: How to concat few rows into a new column in dataframe

2016-01-06 Thread Sabarish Sasidharan
You can just repartition by the id, if the final objective is to have all data for the same key in the same partition. Regards Sab On Wed, Jan 6, 2016 at 11:02 AM, Gavin Yue wrote: > I found that in 1.6 dataframe could do repartition. > > Should I still need to do

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Gavin Yue
I tried the Ted's solution and it works. But I keep hitting the JVM out of memory problem. And grouping the key causes a lot of data shuffling. So I am trying to order the data based on ID first and save as Parquet. Is there way to make sure that the data is partitioned that each ID's data is

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Gavin Yue
I found that in 1.6 dataframe could do repartition. Should I still need to do orderby first or I just have to repartition? On Tue, Jan 5, 2016 at 9:25 PM, Gavin Yue wrote: > I tried the Ted's solution and it works. But I keep hitting the JVM out > of memory

How to concat few rows into a new column in dataframe

2016-01-05 Thread Gavin Yue
Hey, For example, a table df with two columns id name 1 abc 1 bdf 2 ab 2 cd I want to group by the id and concat the string into array of string. like this id 1 [abc,bdf] 2 [ab, cd] How could I achieve this in dataframe? I stuck on df.groupBy("id"). ??? Thanks

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Ted Yu
Something like the following: val zeroValue = collection.mutable.Set[String]() val aggredated = data.aggregateByKey (zeroValue)((set, v) => set += v, (setOne, setTwo) => setOne ++= setTwo) On Tue, Jan 5, 2016 at 2:46 PM, Gavin Yue wrote: > Hey, > > For example, a table

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Michael Armbrust
This would also be possible with an Aggregator in Spark 1.6: https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html On Tue, Jan 5, 2016 at 2:59 PM, Ted Yu wrote: > Something like the following: > > val zeroValue =