Re: Which OutputCommitter to use for S3?

2015-03-25 Thread Pei-Lun Lee
Ash and...@andrewash.com Cc: Josh Rosen rosenvi...@gmail.com; Mingyu Kim m...@palantir.com ; user@spark.apache.org user@spark.apache.org; Aaron Davidson aa...@databricks.com Sent: Saturday, February 21, 2015 7:01 PM Subject: Re: Which OutputCommitter to use for S3? Here

Re: Which OutputCommitter to use for S3?

2015-03-16 Thread Pei-Lun Lee
Message - From: Darin McBeath ddmcbe...@yahoo.com.INVALID To: Mingyu Kim m...@palantir.com; Aaron Davidson ilike...@gmail.com Cc: user@spark.apache.org user@spark.apache.org Sent: Monday, February 23, 2015 3:16 PM Subject: Re: Which OutputCommitter to use for S3? Thanks. I

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Pei-Lun Lee
Sent: Monday, February 23, 2015 3:16 PM Subject: Re: Which OutputCommitter to use for S3? Thanks. I think my problem might actually be the other way around. I'm compiling with hadoop 2, but when I startup Spark, using the ec2 scripts, I don't specify a -hadoop-major-version

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Aaron Davidson
. Darin. - Original Message - From: Darin McBeath ddmcbe...@yahoo.com.INVALID To: Mingyu Kim m...@palantir.com; Aaron Davidson ilike...@gmail.com Cc: user@spark.apache.org user@spark.apache.org Sent: Monday, February 23, 2015 3:16 PM Subject: Re: Which OutputCommitter to use for S3

Re: Which OutputCommitter to use for S3?

2015-02-26 Thread Thomas Demoor
, February 23, 2015 3:16 PM Subject: Re: Which OutputCommitter to use for S3? Thanks. I think my problem might actually be the other way around. I'm compiling with hadoop 2, but when I startup Spark, using the ec2 scripts, I don't specify a -hadoop-major-version and the default is 1. I'm

Re: Which OutputCommitter to use for S3?

2015-02-23 Thread Darin McBeath
ilike...@gmail.com Cc: user@spark.apache.org user@spark.apache.org Sent: Monday, February 23, 2015 3:16 PM Subject: Re: Which OutputCommitter to use for S3? Thanks. I think my problem might actually be the other way around. I'm compiling with hadoop 2, but when I startup Spark, using the ec2

Re: Which OutputCommitter to use for S3?

2015-02-23 Thread Mingyu Kim
, February 21, 2015 7:01 PM Subject: Re: Which OutputCommitter to use for S3? Here is the class: https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_aaron dav_c513916e72101bbe14ecd=AwIFaQc=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6o Onmz8r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84m

Re: Which OutputCommitter to use for S3?

2015-02-23 Thread Darin McBeath
can comment on this! Thanks, Mingyu From: Mingyu Kim m...@palantir.com Date: Monday, February 16, 2015 at 1:15 AM To: user@spark.apache.org user@spark.apache.org Subject: Which OutputCommitter to use for S3? HI all, The default OutputCommitter used by RDD, which is FileOutputCommitter

Re: Which OutputCommitter to use for S3?

2015-02-23 Thread Darin McBeath
it and post a response. - Original Message - From: Mingyu Kim m...@palantir.com To: Darin McBeath ddmcbe...@yahoo.com; Aaron Davidson ilike...@gmail.com Cc: user@spark.apache.org user@spark.apache.org Sent: Monday, February 23, 2015 3:06 PM Subject: Re: Which OutputCommitter to use for S3? Cool

Re: Which OutputCommitter to use for S3?

2015-02-21 Thread Aaron Davidson
on this! Thanks, Mingyu From: Mingyu Kim m...@palantir.com Date: Monday, February 16, 2015 at 1:15 AM To: user@spark.apache.org user@spark.apache.org Subject: Which OutputCommitter to use for S3? HI all, The default OutputCommitter used by RDD, which is FileOutputCommitter, seems

Re: Which OutputCommitter to use for S3?

2015-02-21 Thread Andrew Ash
if anyone using a special OutputCommitter for S3 can comment on this! Thanks, Mingyu From: Mingyu Kim m...@palantir.com Date: Monday, February 16, 2015 at 1:15 AM To: user@spark.apache.org user@spark.apache.org Subject: Which OutputCommitter to use for S3? HI all, The default

Re: Which OutputCommitter to use for S3?

2015-02-20 Thread Mingyu Kim
user@spark.apache.orgmailto:user@spark.apache.org Subject: Which OutputCommitter to use for S3? HI all, The default OutputCommitter used by RDD, which is FileOutputCommitter, seems to require moving files at the commit step, which is not a constant operation in S3, as discussed in http://mail

Re: Which OutputCommitter to use for S3?

2015-02-20 Thread Josh Rosen
OutputCommitter to use for S3? HI all, The default OutputCommitter used by RDD, which is FileOutputCommitter, seems to require moving files at the commit step, which is not a constant operation in S3, as discussed in http://mail-archives.apache.org/mod_mbox/spark-user/201410.mbox/%3c543e33fa.2000

Which OutputCommitter to use for S3?

2015-02-16 Thread Mingyu Kim
HI all, The default OutputCommitter used by RDD, which is FileOutputCommitter, seems to require moving files at the commit step, which is not a constant operation in S3, as discussed in http://mail-archives.apache.org/mod_mbox/spark-user/201410.mbox/%3c543e33fa.2000...@entropy.be%3E. People