Re: Spark output data to S3 is very slow

2016-09-17 Thread Qiang Li
Tried several times, it is slow same as before, I will let spark output
data to HDFS, then sync data to S3 as temporary solution.

Thank you.

On Sat, Sep 17, 2016 at 10:43 AM, Takeshi Yamamuro 
wrote:

> Hi,
>
> Have you seen the previous thread?
> https://www.mail-archive.com/user@spark.apache.org/msg56791.html
>
> // maropu
>
>
> On Sat, Sep 17, 2016 at 11:34 AM, Qiang Li  wrote:
>
>> Hi,
>>
>>
>> I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very
>> quickly, but the last step, spark spend lots of time to rename or move data
>> from s3 temporary directory to real directory, then I try to set
>>
>> spark.hadoop.spark.sql.parquet.output.committer.class=org.
>> apache.spark.sql.execution.datasources.parquet.DirectParq
>> uetOutputCommitter
>> or
>> spark.sql.parquet.output.committer.class=org.apache.spark.
>> sql.parquet.DirectParquetOutputCommitter
>>
>> But both doesn't work, looks like spark 2.0 removed these configs, how
>> can I let spark output directly without temporary directory ?
>>
>>
>>
>> *This email may contain or reference confidential information and is
>> intended only for the individual to whom it is addressed.  Please refrain
>> from distributing, disclosing or copying this email and the information
>> contained within unless you are the intended recipient.  If you received
>> this email in error, please notify us at le...@appannie.com
>> ** immediately and remove it from your system.*
>
>
>
>
> --
> ---
> Takeshi Yamamuro
>

-- 
*This email may contain or reference confidential information and is 
intended only for the individual to whom it is addressed.  Please refrain 
from distributing, disclosing or copying this email and the information 
contained within unless you are the intended recipient.  If you received 
this email in error, please notify us at le...@appannie.com 
** immediately and remove it from your system.*


Re: Spark output data to S3 is very slow

2016-09-16 Thread Takeshi Yamamuro
Hi,

Have you seen the previous thread?
https://www.mail-archive.com/user@spark.apache.org/msg56791.html

// maropu


On Sat, Sep 17, 2016 at 11:34 AM, Qiang Li  wrote:

> Hi,
>
>
> I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very
> quickly, but the last step, spark spend lots of time to rename or move data
> from s3 temporary directory to real directory, then I try to set
>
> spark.hadoop.spark.sql.parquet.output.committer.
> class=org.apache.spark.sql.execution.datasources.parquet.
> DirectParquetOutputCommitter
> or
> spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.
> DirectParquetOutputCommitter
>
> But both doesn't work, looks like spark 2.0 removed these configs, how can
> I let spark output directly without temporary directory ?
>
>
>
> *This email may contain or reference confidential information and is
> intended only for the individual to whom it is addressed.  Please refrain
> from distributing, disclosing or copying this email and the information
> contained within unless you are the intended recipient.  If you received
> this email in error, please notify us at le...@appannie.com
> ** immediately and remove it from your system.*




-- 
---
Takeshi Yamamuro


Spark output data to S3 is very slow

2016-09-16 Thread Qiang Li
Hi,


I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very
quickly, but the last step, spark spend lots of time to rename or move data
from s3 temporary directory to real directory, then I try to set

spark.hadoop.spark.sql.parquet.output.committer.class=org.apache.spark.sql.execution.datasources.parquet.DirectParquetOutputCommitter
or
spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.DirectParquetOutputCommitter

But both doesn't work, looks like spark 2.0 removed these configs, how can
I let spark output directly without temporary directory ?

-- 
*This email may contain or reference confidential information and is 
intended only for the individual to whom it is addressed.  Please refrain 
from distributing, disclosing or copying this email and the information 
contained within unless you are the intended recipient.  If you received 
this email in error, please notify us at le...@appannie.com 
** immediately and remove it from your system.*