On Wed, 23 Sep 2020 at 20:16, Jim Brennan <james.bren...@verizonmedia.com.invalid> wrote:
> I replied in the Jira. The speed up provided by the v2 commit algorithm > is very important to us at Verizon Media (Yahoo). Please do not remove it. > I referred to this comment from Jason Lowe on the original Jira: > > https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115 > > I think it would be appropriate to better document the limitations of the > v2 algorithm and possibly make it not be the default, as long as we can > still use it. > What about: -change default -log @ WARN in job setup (but not tasks) People like yourself -aware of and happy with the risk- can carry on, but everyone else gets a warning of risk I could also have a special log for the warning so you can turn it off... > > On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <i...@google.com.invalid> > wrote: > > > What will be the solution for object stores to have fast and correct > > commit algorithms? > > > > On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran > > <ste...@cloudera.com.invalid> wrote: > > > >> I've got a PR up to completely remove the v2 commit algorithm > >> > >> https://github.com/apache/hadoop/pull/2320 > >> > >> That may seem overkill, but while *we* know there's a small window of > risk > >> (task attempt 1 failing partway through a nonatomic commit), that's not > >> known/appreciated by others. > >> > >> The patch removes the v2 codepath from FileOutputCommitter, making it a > >> lot > >> less complicated, and when v2 is requested, a warning is printed and the > >> option ignored. > >> > >> Overkill? Maybe. But it guarantees correctness > >> > > >