I am trying to save the files as Paraquet.

On Thu, Dec 15, 2016 at 10:41 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> What is the format?
>
>
> ------------------------------
> *From:* KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
> *Sent:* Thursday, December 15, 2016 7:54:27 PM
> *To:* user @spark
> *Subject:* Spark Dataframe: Save to hdfs is taking long time
>
> Hi,
>
> I am using issue while saving the dataframe back to HDFS. It's taking long
> time to run.
>
> val results_dataframe = sqlContext.sql("select gt.*,ct.* from 
> PredictTempTable pt,ClusterTempTable ct,GamificationTempTable gt where 
> gt.vin=pt.vin and pt.cluster=ct.cluster")
> results_dataframe.coalesce(numPartitions)
> results_dataframe.persist(StorageLevel.MEMORY_AND_DISK)
>
> dataFrame.write.mode(saveMode).format(format)
>   .option(Codec, compressCodec) //"org.apache.hadoop.io.compress.snappyCodec"
>   .save(outputPath)
>
> It was taking long time and total number of records for  this dataframe is 
> 4903764
>
> I even increased number of partitions from 10 to 20, still no luck. Can 
> anyone help me in resolving this performance issue
>
> Thanks,
>
> Asmath
>
>

Reply via email to