Hi,
We can use CombineByKey to achieve this.
val finalRDD = tempRDD.combineByKey((x: (Any, Any)) => (x),(acc: (Any, Any), x)
=> (acc, x),(acc1: (Any, Any), acc2: (Any, Any)) => (acc1, acc2))
finalRDD.collect.foreach(println)
(amazon,((book1, tech),(book2,tech)))(barns&noble, (book,tech))(eBay,
(book1,tech))
Thanks,Sivakumar
-------- Original message --------
From: Daniel Haviv <[email protected]>
Date: 30/03/2016 18:58 (GMT+08:00)
To: Akhil Das <[email protected]>
Cc: Suniti Singh <[email protected]>, [email protected], dev
<[email protected]>
Subject: Re: aggregateByKey on PairRDD
Hi,shouldn't groupByKey be avoided
(https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html)
?
Thank you,.Daniel
On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das <[email protected]> wrote:
Isn't it what tempRDD.groupByKey does?
ThanksBest Regards
On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh <[email protected]> wrote:
Hi All,
I have an RDD having the data in the following form :
tempRDD: RDD[(String, (String, String))](brand , (product,
key))("amazon",("book1","tech"))("eBay",("book1","tech"))
("barns&noble",("book","tech"))
("amazon",("book2","tech"))
I would like to group the data by Brand and would like to get the result set in
the following format :resultSetRDD : RDD[(String, List[(String), (String)]i
tried using the aggregateByKey but kind of not getting how to achieve this. OR
is there any other way to achieve this?
val resultSetRDD = tempRDD.aggregateByKey("")({case (aggr , value) => aggr +
String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2)resultSetRDD =
(amazon,("book1","tech"),("book2","tech"))Thanks,Suniti