On Wed, 23 Sep 2020 at 20:07, Igor Dvorzhak <i...@google.com.invalid> wrote:
> What will be the solution for object stores to have fast and correct > commit algorithms? > https://github.com/steveloughran/zero-rename-committer/releases/tag/tag_draft_006 There's a plugin point for you to add an explicit committer for gcs: A key thing is: what atomic operations does your store have? 1. HDFS has rename and create-no-overwrite 2. S3 has only PUT/complete multipart upload, and no fail-if-exists checks > On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran > <ste...@cloudera.com.invalid> wrote: > >> I've got a PR up to completely remove the v2 commit algorithm >> >> https://github.com/apache/hadoop/pull/2320 >> >> That may seem overkill, but while *we* know there's a small window of risk >> (task attempt 1 failing partway through a nonatomic commit), that's not >> known/appreciated by others. >> >> The patch removes the v2 codepath from FileOutputCommitter, making it a >> lot >> less complicated, and when v2 is requested, a warning is printed and the >> option ignored. >> >> Overkill? Maybe. But it guarantees correctness >> >