I Think the conclusion is "no change for now", but people do need to understand the risks better. One thing I'd like to understand are: which FileOutputFormat subclasses generate unique filenames which are different in different task attempts? I've heard a mention of Avro here, but not looked in the code
On Thu, 24 Sep 2020 at 17:27, [email protected] <[email protected]> wrote: > Thanks Steve and Jim for bringing this issue to our attention. > > IIUC, Serial commit takes minutes with mrv1, whereas with mrv2 it is very > quick. With this kind of performance > difference, is wise to change the default behavior for released versions > of Hadoop? Should this be limited to > trunk? > > Thanks, > -Eric Payne > > > On Wednesday, September 23, 2020, 2:16:14 PM CDT, Jim Brennan > <[email protected]> wrote: > > I replied in the Jira. The speed up provided by the v2 commit algorithm > is very important to us at Verizon Media (Yahoo). Please do not remove it. > I referred to this comment from Jason Lowe on the original Jira: > > https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115 > > I think it would be appropriate to better document the limitations of the > v2 algorithm and possibly make it not be the default, as long as we can > still use it. > > On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <[email protected]> > wrote: > > > What will be the solution for object stores to have fast and correct > > commit algorithms? > > > > On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran > > <[email protected]> wrote: > > > >> I've got a PR up to completely remove the v2 commit algorithm > >> > >> https://github.com/apache/hadoop/pull/2320 > >> > >> That may seem overkill, but while *we* know there's a small window of > risk > >> (task attempt 1 failing partway through a nonatomic commit), that's not > >> known/appreciated by others. > >> > >> The patch removes the v2 codepath from FileOutputCommitter, making it a > >> lot > >> less complicated, and when v2 is requested, a warning is printed and the > >> option ignored. > >> > >> Overkill? Maybe. But it guarantees correctness > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
