I am trying to save the files as Paraquet. On Thu, Dec 15, 2016 at 10:41 PM, Felix Cheung <felixcheun...@hotmail.com> wrote:
> What is the format? > > > ------------------------------ > *From:* KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> > *Sent:* Thursday, December 15, 2016 7:54:27 PM > *To:* user @spark > *Subject:* Spark Dataframe: Save to hdfs is taking long time > > Hi, > > I am using issue while saving the dataframe back to HDFS. It's taking long > time to run. > > val results_dataframe = sqlContext.sql("select gt.*,ct.* from > PredictTempTable pt,ClusterTempTable ct,GamificationTempTable gt where > gt.vin=pt.vin and pt.cluster=ct.cluster") > results_dataframe.coalesce(numPartitions) > results_dataframe.persist(StorageLevel.MEMORY_AND_DISK) > > dataFrame.write.mode(saveMode).format(format) > .option(Codec, compressCodec) //"org.apache.hadoop.io.compress.snappyCodec" > .save(outputPath) > > It was taking long time and total number of records for this dataframe is > 4903764 > > I even increased number of partitions from 10 to 20, still no luck. Can > anyone help me in resolving this performance issue > > Thanks, > > Asmath > >