Re: Spark GroupBy Save to different files
Hi arun rdd1.groupBy(_.city).map(s=>(s._1,s._2.toList.toString())).toDF("city","data").write. *partitionBy("city")*.csv("/data") should work for you . Regards Pralabh On Sat, Sep 2, 2017 at 7:58 AM, Ryanwrote: > you may try foreachPartition > > On Fri, Sep 1, 2017 at 10:54 PM, asethia wrote: > >> Hi, >> >> I have list of person records in following format: >> >> case class Person(fName:String, city:String) >> >> val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1")) >> >> val rdd:RDD[Person]=sc.parallelize(l) >> >> val groupBy:RDD[(String, Iterable[Person])]=rdd.groupBy(_.city) >> >> I would like to save these group by records in different files (for >> example >> by city). Please can some one help me here. >> >> I tried this but not able to create those files >> >> groupBy.foreach(x=>{ >> x._2.toList.toDF().rdd.saveAsObjectFile(s"file:///tmp/files/${x._1}") >> }) >> >> Thanks >> Arun >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >
Re: Spark GroupBy Save to different files
you may try foreachPartition On Fri, Sep 1, 2017 at 10:54 PM, asethiawrote: > Hi, > > I have list of person records in following format: > > case class Person(fName:String, city:String) > > val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1")) > > val rdd:RDD[Person]=sc.parallelize(l) > > val groupBy:RDD[(String, Iterable[Person])]=rdd.groupBy(_.city) > > I would like to save these group by records in different files (for example > by city). Please can some one help me here. > > I tried this but not able to create those files > > groupBy.foreach(x=>{ > x._2.toList.toDF().rdd.saveAsObjectFile(s"file:///tmp/files/${x._1}") > }) > > Thanks > Arun > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Spark GroupBy Save to different files
Hi, I have list of person records in following format: case class Person(fName:String, city:String) val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1")) val rdd:RDD[Person]=sc.parallelize(l) val groupBy:RDD[(String, Iterable[Person])]=rdd.groupBy(_.city) I would like to save these group by records in different files (for example by city). Please can some one help me here. I tried this but not able to create those files groupBy.foreach(x=>{ x._2.toList.toDF().rdd.saveAsObjectFile(s"file:///tmp/files/${x._1}") }) Thanks Arun -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org