Does S3Guard help with this? I thought it was like S3mper and could help detect eventual consistency problems, but wouldn't help with the committer problem.
rb On Tue, Feb 21, 2017 at 12:39 PM, Matthew Schauer <matthew.scha...@ibm.com> wrote: > Thanks for the repo, Ryan! I had heard that Netflix had a committer that > used the local filesystem as a temporary store, but I wasn't able to find > that anywhere until now. I implemented something similar that writes to > HDFS and then copies to S3, but it doesn't use the multipart upload API, so > I'm sure yours will be faster. I think this is the best thing until S3Guard > comes out. > > As far as my UUID-tracking approach goes, I was under the impression that a > given task would write the same set of files on each attempt. Thus, if the > task fails, either the whole job is aborted and the files are removed, or > the task is retried and the files are overwritten. On the other and, I can > see how having partially-written data visible to readers immediately could > cause problems, and that is a good reason to avoid my approach. > > Steve -- that design document was a very enlightening read. I will be > interested in following and possibly contributing to S3Guard in the future. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p21041.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > -- Ryan Blue Software Engineer Netflix --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org