Re: Spark output data to S3 is very slow
Tried several times, it is slow same as before, I will let spark output data to HDFS, then sync data to S3 as temporary solution. Thank you. On Sat, Sep 17, 2016 at 10:43 AM, Takeshi Yamamurowrote: > Hi, > > Have you seen the previous thread? > https://www.mail-archive.com/user@spark.apache.org/msg56791.html > > // maropu > > > On Sat, Sep 17, 2016 at 11:34 AM, Qiang Li wrote: > >> Hi, >> >> >> I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very >> quickly, but the last step, spark spend lots of time to rename or move data >> from s3 temporary directory to real directory, then I try to set >> >> spark.hadoop.spark.sql.parquet.output.committer.class=org. >> apache.spark.sql.execution.datasources.parquet.DirectParq >> uetOutputCommitter >> or >> spark.sql.parquet.output.committer.class=org.apache.spark. >> sql.parquet.DirectParquetOutputCommitter >> >> But both doesn't work, looks like spark 2.0 removed these configs, how >> can I let spark output directly without temporary directory ? >> >> >> >> *This email may contain or reference confidential information and is >> intended only for the individual to whom it is addressed. Please refrain >> from distributing, disclosing or copying this email and the information >> contained within unless you are the intended recipient. If you received >> this email in error, please notify us at le...@appannie.com >> ** immediately and remove it from your system.* > > > > > -- > --- > Takeshi Yamamuro > -- *This email may contain or reference confidential information and is intended only for the individual to whom it is addressed. Please refrain from distributing, disclosing or copying this email and the information contained within unless you are the intended recipient. If you received this email in error, please notify us at le...@appannie.com ** immediately and remove it from your system.*
Re: Spark output data to S3 is very slow
Hi, Have you seen the previous thread? https://www.mail-archive.com/user@spark.apache.org/msg56791.html // maropu On Sat, Sep 17, 2016 at 11:34 AM, Qiang Liwrote: > Hi, > > > I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very > quickly, but the last step, spark spend lots of time to rename or move data > from s3 temporary directory to real directory, then I try to set > > spark.hadoop.spark.sql.parquet.output.committer. > class=org.apache.spark.sql.execution.datasources.parquet. > DirectParquetOutputCommitter > or > spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet. > DirectParquetOutputCommitter > > But both doesn't work, looks like spark 2.0 removed these configs, how can > I let spark output directly without temporary directory ? > > > > *This email may contain or reference confidential information and is > intended only for the individual to whom it is addressed. Please refrain > from distributing, disclosing or copying this email and the information > contained within unless you are the intended recipient. If you received > this email in error, please notify us at le...@appannie.com > ** immediately and remove it from your system.* -- --- Takeshi Yamamuro
Spark output data to S3 is very slow
Hi, I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very quickly, but the last step, spark spend lots of time to rename or move data from s3 temporary directory to real directory, then I try to set spark.hadoop.spark.sql.parquet.output.committer.class=org.apache.spark.sql.execution.datasources.parquet.DirectParquetOutputCommitter or spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.DirectParquetOutputCommitter But both doesn't work, looks like spark 2.0 removed these configs, how can I let spark output directly without temporary directory ? -- *This email may contain or reference confidential information and is intended only for the individual to whom it is addressed. Please refrain from distributing, disclosing or copying this email and the information contained within unless you are the intended recipient. If you received this email in error, please notify us at le...@appannie.com** immediately and remove it from your system.*