[ https://issues.apache.org/jira/browse/HDFS-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668993#comment-13668993 ]
Konstantin Shvachko commented on HDFS-4849: ------------------------------------------- Given Matthew's comment I think I should have provided more motivation for the issue first. The idea to make these changes comes from the desire to have MR and YARN run the jobs without interruption in HA case. Today if NameNode dies and failover to StandbyNode occurs some jobs can fail. This mostly depends on whether failure of NN happened during idempotent or non-idempotent operation. Idempotent operations, like getBlockLocations or addBlock, are retried and the client will eventually complete such operation via StandbyNode, when SBN becomes active. Non-idempotent operations like create and delete are not retried, they just fail. Therefore, MR job fails if it tries to create an output file for a reducer or delete a directory at cleanup stage just at the moment NN crashes. While if it could retry the create on SBN, it would have succeeded. So we might need to compromize and loozen the semantics of some HDFS operations in order to satisfy stricter availabilty and scalability requirements. And we better do it now before APIs are frozen for branch 2. > Idempotent create, append and delete operations. > ------------------------------------------------ > > Key: HDFS-4849 > URL: https://issues.apache.org/jira/browse/HDFS-4849 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.0.4-alpha > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > > create, append and delete operations can be made idempotent. This will reduce > chances for a job or other app failures when NN fails over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira