Thanks Steve and Jim for bringing this issue to our attention.

IIUC, Serial commit takes minutes with mrv1, whereas with mrv2 it is very 
quick. With this kind of performance
difference, is wise to change the default behavior for released versions of 
Hadoop? Should this be limited to
trunk?

Thanks,
-Eric Payne


On Wednesday, September 23, 2020, 2:16:14 PM CDT, Jim Brennan 
<james.bren...@verizonmedia.com.invalid> wrote: 

I replied in the Jira.  The speed up provided by the v2 commit algorithm
is very important to us at Verizon Media (Yahoo).  Please do not remove it.
I referred to this comment from Jason Lowe on the original Jira:
https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115

I think it would be appropriate to better document the limitations of the
v2 algorithm and possibly make it not be the default, as long as we can
still use it.

On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <i...@google.com.invalid>
wrote:

> What will be the solution for object stores to have fast and correct
> commit algorithms?
>
> On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran
> <ste...@cloudera.com.invalid> wrote:
>
>> I've got a PR up to completely remove the v2 commit algorithm
>>
>> https://github.com/apache/hadoop/pull/2320
>>
>> That may seem overkill, but while *we* know there's a small window of risk
>> (task attempt 1 failing partway through a nonatomic commit), that's not
>> known/appreciated by others.
>>
>> The patch removes the v2 codepath from FileOutputCommitter, making it a
>> lot
>> less complicated, and when v2 is requested, a warning is printed and the
>> option ignored.
>>
>> Overkill? Maybe. But it guarantees correctness
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to