[
https://issues.apache.org/jira/browse/HADOOP-16490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901946#comment-16901946
]
Steve Loughran commented on HADOOP-16490:
-----------------------------------------
Update
Here's some traces of "hadoop fs -copyFromLocal" without the -d option
encountering a failure from a cached 404, while other requests are succeeding.
If I include HADOOP-13884 in this patch we'd eliminate the first of these HEAD
requests, but #2 comes from {{CommandWithDestination.create()}} attempting to
register the stream its just opened for
deleteOnExit()....FileSystem.deleteOnExit then does an exists() & iff that
returns true queues the delete. Which in this case generates a 404 (file
doesn't exist yet) which is going to be cached
This is the trace with s3Guard enabled auth mode = false. We get the record
from DDB so the rename(source, dest) gets as far as the copy operation before
hitting FNFE and then, while it retries, doesn't ever get the 200 + data.
Either this is a long cached entry, or the HEADs take place in the loop are too
close together and the 404 entry is being refreshed.
{code}
10:18:24,181 HEAD jar1.jar._COPYING (S3aFS.create )
10:18:24,944 HEAD jar1.jar._COPYING_ deleteOnExit() registration
10:18:25,733 PUT jar1.jar._COPYING_ (out
10:18:26,740 HEAD jar1.jar._COPYING_ as out of band check at the start of rename
10:18:27,411 HEAD jar1.jar._COPYING_ as preamble to COPY
...repeated until 10:18:38,939 and final failure
{code}
Therefore only 3 seconds from first HEAD until those of the copy operation,
which isn't retrying enough/retrying frequently enough that the cached entry
stays refreshed.
Other files uploaded fine
{code}
rm.txt._COPYING_
2019-08-05 10:10:03,094 HEAD -> 404
2019-08-05 10:10:03,375 HEAD -> 404
2019-08-05 10:10:03,700 PUT -> 200
2019-08-05 10:10:04,669 HEAD -> 200
2019-08-05 10:10:05,219 HEAD -> 200
2019-08-05 10:10:05,565 COPY -> 200
viewfs_rm.txt_COPYING_
2019-08-05 10:10:06,495 HEAD -> 404
2019-08-05 10:10:06,738 HEAD -> 404
2019-08-05 10:10:07,336 PUT -> 200
2019-08-05 10:10:08,119 HEAD -> 200
2019-08-05 10:10:08,799 HEAD -> 200
2019-08-05 10:10:09,058 COPY -> 200
{code}
Thoughts
# Eliminate the HEAD in open() will save time and a 404 arising there
# but the deleteOnExit registration attempt will do it anyway, so the caching
problem will fail
# rather than just add a special s3guard retry, lets just change the retry
policy to exponential and have some set of numbers which back off nicely, e.g
1000ms + 6 retries; so 64s of total retry and 30s gap at the last one?
Note, because of that deleteOnExit call, without S3Guard the rename() will fail
sooner, at the first "does the source exist" check which we've got past here
regarding delete on exit, I'm never a fan of that because all it does is create
a codepath which is reached on successful shutdown, and when the connection to
(HDFS, S3, ABFS) is playing up it can block that teardown, I'm tempted to cut
it out of this command, or at least postpone until after the store. IF we had a
{{StreamCapabilities}} probe "file visible after creation" then maybe we couid
skip for this and other stores. Future work.
This call to deleteOnExit is the only place in the production codebase other
than. org.apache.hadoop.yarn.server.sharedcachemanager.CleanerService, and that
calls it after the file has been created.
> S3GuardExistsRetryPolicy handle FNFE eventual consistency better
> ----------------------------------------------------------------
>
> Key: HADOOP-16490
> URL: https://issues.apache.org/jira/browse/HADOOP-16490
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
>
> If S3Guard is encountering delayed consistency (FNFE from tombstone; failure
> to open file) then
> * it only retries with the same times as everything else. We should make it
> differently configurable
> * when an FNFE is finally thrown, rename() treats it as being caused by the
> original source path missing, when in fact its something else. Proposed:
> somehow propagate the failure up differently, probably in the
> S3AFileSystem.copyFile() code
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]