[ https://issues.apache.org/jira/browse/HADOOP-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947241#comment-17947241 ]
Manish Bhatt edited comment on HADOOP-17215 at 4/25/25 7:12 AM: ---------------------------------------------------------------- [~mthakur] Sharing an example which can lead to data loss: Let's suppose we sent a request to create a file with overwrite = true: 1. The request got stuck at the server, timed out, and the driver retried the request. The retried request succeeded, and some more data got appended to the file. 2. The initial request, which was previously stuck, also got processed. Since the overwrite flag was true, it created a new file, causing the data appended to the old file to be lost. With this change, we will first try to create the resource with the overwrite flag set to false. This way, even if the request gets stuck, it won't create another file, and no data will be lost. If the first attempt with the overwrite flag set to false fails due to an HTTP_CONFLICT, we then create the resource with the overwrite flag set to true. was (Author: JIRAUSER306911): [~mthakur] Sharing an example which can lead to data loss: Let's suppose we sent a request to create a file with overwrite = true: 1. The request got stuck at the server, timed out, and the driver retried the request. The retried request succeeded, and some more data got appended to the file. 2. The initial request, which was previously stuck, also got processed. Since the overwrite flag was true, it created a new file, causing the data appended to the old file to be lost. With this change, we first attempt to create the resource with the overwrite flag set to false. So even if the request gets stuck, it will not create another file, and no data will be lost. > ABFS: Support for conditional overwrite > --------------------------------------- > > Key: HADOOP-17215 > URL: https://issues.apache.org/jira/browse/HADOOP-17215 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Affects Versions: 3.3.0 > Reporter: Sneha Vijayarajan > Assignee: Sneha Vijayarajan > Priority: Major > Labels: abfsactive > Fix For: 3.3.1, 3.4.0 > > > Filesystem Create APIs that do not accept an argument for overwrite flag end > up defaulting it to true. > We are observing that request count of creates with overwrite=true is more > and primarily because of the default setting of the flag is true of the > called Create API. When a create with overwrite ends up timing out, we have > observed that it could lead to race conditions between the first create and > retried one running almost parallel. > To avoid this scenario for create with overwrite=true request, ABFS driver > will always attempt to create without overwrite. If the create fails due to > fileAlreadyPresent, it will resend the request with overwrite=true. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org