Re: Spark GroupBy Save to different files

2017-09-04 Thread Pralabh Kumar
Hi arun

rdd1.groupBy(_.city).map(s=>(s._1,s._2.toList.toString())).toDF("city","data").write.
*partitionBy("city")*.csv("/data")

should work for you .

Regards
Pralabh

On Sat, Sep 2, 2017 at 7:58 AM, Ryan  wrote:

> you may try foreachPartition
>
> On Fri, Sep 1, 2017 at 10:54 PM, asethia  wrote:
>
>> Hi,
>>
>> I have list of person records in following format:
>>
>> case class Person(fName:String, city:String)
>>
>> val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1"))
>>
>> val rdd:RDD[Person]=sc.parallelize(l)
>>
>> val groupBy:RDD[(String, Iterable[Person])]=rdd.groupBy(_.city)
>>
>> I would like to save these group by records in different files (for
>> example
>> by city). Please can some one help me here.
>>
>> I tried this but not able to create those files
>>
>>  groupBy.foreach(x=>{
>> x._2.toList.toDF().rdd.saveAsObjectFile(s"file:///tmp/files/${x._1}")
>>   })
>>
>> Thanks
>> Arun
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


Re: Spark GroupBy Save to different files

2017-09-01 Thread Ryan
you may try foreachPartition

On Fri, Sep 1, 2017 at 10:54 PM, asethia  wrote:

> Hi,
>
> I have list of person records in following format:
>
> case class Person(fName:String, city:String)
>
> val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1"))
>
> val rdd:RDD[Person]=sc.parallelize(l)
>
> val groupBy:RDD[(String, Iterable[Person])]=rdd.groupBy(_.city)
>
> I would like to save these group by records in different files (for example
> by city). Please can some one help me here.
>
> I tried this but not able to create those files
>
>  groupBy.foreach(x=>{
> x._2.toList.toDF().rdd.saveAsObjectFile(s"file:///tmp/files/${x._1}")
>   })
>
> Thanks
> Arun
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark GroupBy Save to different files

2017-09-01 Thread asethia
Hi,

I have list of person records in following format:

case class Person(fName:String, city:String)

val l=List(Person("A","City1"),Person("B","City2"),Person("C","City1"))

val rdd:RDD[Person]=sc.parallelize(l)

val groupBy:RDD[(String, Iterable[Person])]=rdd.groupBy(_.city)

I would like to save these group by records in different files (for example
by city). Please can some one help me here.

I tried this but not able to create those files

 groupBy.foreach(x=>{
x._2.toList.toDF().rdd.saveAsObjectFile(s"file:///tmp/files/${x._1}")
  })

Thanks
Arun



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org