I Think the conclusion is "no change for now", but people do need to
understand the risks better. One thing I'd like to understand are: which
FileOutputFormat subclasses generate unique filenames which are different
in different task attempts? I've heard a mention of Avro here, but not
looked in the code

On Thu, 24 Sep 2020 at 17:27, epa...@apache.org <epa...@apache.org> wrote:

> Thanks Steve and Jim for bringing this issue to our attention.
>
> IIUC, Serial commit takes minutes with mrv1, whereas with mrv2 it is very
> quick. With this kind of performance
> difference, is wise to change the default behavior for released versions
> of Hadoop? Should this be limited to
> trunk?
>
> Thanks,
> -Eric Payne
>
>
> On Wednesday, September 23, 2020, 2:16:14 PM CDT, Jim Brennan
> <james.bren...@verizonmedia.com.invalid> wrote:
>
> I replied in the Jira.  The speed up provided by the v2 commit algorithm
> is very important to us at Verizon Media (Yahoo).  Please do not remove it.
> I referred to this comment from Jason Lowe on the original Jira:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115
>
> I think it would be appropriate to better document the limitations of the
> v2 algorithm and possibly make it not be the default, as long as we can
> still use it.
>
> On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <i...@google.com.invalid>
> wrote:
>
> > What will be the solution for object stores to have fast and correct
> > commit algorithms?
> >
> > On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
> > <ste...@cloudera.com.invalid> wrote:
> >
> >> I've got a PR up to completely remove the v2 commit algorithm
> >>
> >> https://github.com/apache/hadoop/pull/2320
> >>
> >> That may seem overkill, but while *we* know there's a small window of
> risk
> >> (task attempt 1 failing partway through a nonatomic commit), that's not
> >> known/appreciated by others.
> >>
> >> The patch removes the v2 codepath from FileOutputCommitter, making it a
> >> lot
> >> less complicated, and when v2 is requested, a warning is printed and the
> >> option ignored.
> >>
> >> Overkill? Maybe. But it guarantees correctness
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>

Reply via email to