[jira] [Commented] (HADOOP-16490) S3GuardExistsRetryPolicy handle FNFE eventual consistency better

Steve Loughran (JIRA) Wed, 07 Aug 2019 03:13:43 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-16490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901946#comment-16901946
 ]


Steve Loughran commented on HADOOP-16490:
-----------------------------------------

Update

Here's some traces of "hadoop fs -copyFromLocal" without the -d option 
encountering a failure from a cached 404, while other requests are succeeding.

If I include HADOOP-13884 in this patch we'd eliminate the first of these HEAD 
requests, but #2 comes from {{CommandWithDestination.create()}} attempting to 
register the stream its just opened for 
deleteOnExit()....FileSystem.deleteOnExit then does an exists() & iff that 
returns true queues the delete. Which in this case generates a 404 (file 
doesn't exist yet) which is going to be cached

This is the trace with s3Guard enabled auth mode = false. We get the record 
from DDB so the rename(source, dest) gets as far as the copy operation before 
hitting FNFE and then, while it retries, doesn't ever get the 200 + data.  
Either this is a long cached entry, or the HEADs take place in the loop are too 
close together and the 404 entry is being refreshed.

{code}
10:18:24,181 HEAD jar1.jar._COPYING (S3aFS.create )
10:18:24,944 HEAD jar1.jar._COPYING_  deleteOnExit() registration
10:18:25,733 PUT jar1.jar._COPYING_  (out
10:18:26,740 HEAD jar1.jar._COPYING_ as out of band check at the start of rename
10:18:27,411 HEAD jar1.jar._COPYING_ as preamble to COPY
...repeated until 10:18:38,939 and final failure
{code}

Therefore only 3 seconds from first HEAD until those of the copy operation, 
which isn't retrying enough/retrying frequently enough that the cached entry 
stays refreshed.

Other files uploaded fine


{code}
rm.txt._COPYING_ 

2019-08-05 10:10:03,094 HEAD -> 404
2019-08-05 10:10:03,375 HEAD -> 404
2019-08-05 10:10:03,700 PUT  -> 200
2019-08-05 10:10:04,669 HEAD -> 200 
2019-08-05 10:10:05,219 HEAD -> 200
2019-08-05 10:10:05,565 COPY -> 200

viewfs_rm.txt_COPYING_

2019-08-05 10:10:06,495 HEAD -> 404
2019-08-05 10:10:06,738 HEAD -> 404
2019-08-05 10:10:07,336 PUT  -> 200 
2019-08-05 10:10:08,119 HEAD -> 200
2019-08-05 10:10:08,799 HEAD -> 200
2019-08-05 10:10:09,058 COPY -> 200

{code}

Thoughts

# Eliminate the HEAD in open() will save time and a 404 arising there
# but the deleteOnExit registration attempt will do it anyway, so the caching 
problem will fail
# rather than just add a special s3guard retry, lets just change the retry 
policy to exponential and have some set of numbers which back off nicely, e.g 
1000ms + 6 retries; so 64s of total retry and 30s gap at the last one? 

Note, because of that deleteOnExit call, without S3Guard the rename() will fail 
sooner, at the first "does the source exist" check which we've got past here

regarding delete on exit, I'm never a fan of that because all it does is create 
a codepath which is reached on successful shutdown, and when the connection to 
(HDFS, S3, ABFS) is playing up it can block that teardown, I'm tempted to cut 
it out of this command, or at least postpone until after the store. IF we had a 
{{StreamCapabilities}} probe "file visible after creation" then maybe we couid 
skip for this and other stores. Future work.

This call to deleteOnExit is the only place in the production codebase other 
than. org.apache.hadoop.yarn.server.sharedcachemanager.CleanerService, and that 
calls it after the file has been created. 


> S3GuardExistsRetryPolicy handle FNFE eventual consistency better
> ----------------------------------------------------------------
>
>                 Key: HADOOP-16490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16490
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> If S3Guard is encountering delayed consistency (FNFE from tombstone; failure 
> to open file) then 
> * it only retries with the same times as everything else. We should make it 
> differently configurable
> * when an FNFE is finally thrown, rename() treats it as being caused by the 
> original source path missing, when in fact its something else. Proposed: 
> somehow propagate the failure up differently, probably in the 
> S3AFileSystem.copyFile() code



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-16490) S3GuardExistsRetryPolicy handle FNFE eventual consistency better

Reply via email to