Steve Loughran created MAPREDUCE-7282: -----------------------------------------
Summary: MR v2 commit algorithm is dangerous, should be deprecated and not the default Key: MAPREDUCE-7282 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7282 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 3.1.3, 3.2.1, 3.3.0, 3.3.1 Reporter: Steve Loughran The v2 MR commit algorithm moves files from the task attempt dir into the dest dir on task commit -one by one It is therefore not atomic # if a task commit fails partway through and another task attempt commits -unless exactly the same filenames are used, output of the first attempt may be included in the final result # if a worker partitions partway through task commit, and then continues after another attempt has committed, it may partially overwrite the output -even when the filenames are the same Both MR and spark assume that task commits are atomic. Either they need to consider that this is not the case, we add a way to probe for a committer supporting atomic task commit, and the engines both add handling for task commit failures (probably fail job) Better: we remove this as the default, maybe also warn when it is being used -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org