[
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233836#comment-16233836
]
ASF GitHub Bot commented on HDFS-10323:
---------------------------------------
GitHub user wenxinhe opened a pull request:
https://github.com/apache/hadoop/pull/287
HDFS-10323. transient deleteOnExit failure in ViewFileSystem due to close()
ordering
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wenxinhe/hadoop trunk
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/hadoop/pull/287.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #287
----
commit a8b39e070b09005b2781ee46a9b2f3a09c04246e
Author: wenxinhe <[email protected]>
Date: 2017-11-01T09:05:16Z
HDFS-10323. transient deleteOnExit failure in ViewFileSystem due to close()
ordering
----
> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> ------------------------------------------------------------------------
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: federation
> Affects Versions: 2.6.0
> Reporter: Ben Podgursky
> Assignee: Wenxin He
> Priority: Major
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the
> error is, but I believe what is happening is that the ViewFileSystem’s child
> FileSystems are being close()’d before the ViewFileSystem, due to the random
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem
> tries to close(), it tries to forward the delete() calls to the appropriate
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it
> involves testing behavior on actual JVM shutdown. However, I can verify that
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);
> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;
> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
> if (fileSystem.exists(randomTemporaryDir)) {
> fileSystem.deleteOnExit(randomTemporaryDir);
> }
>
}
> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first
> glance I see two ways to fix this behavior:
> 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all
> other FileSystems.
> Would appreciate any thoughts of whether this seems accurate, and thoughts
> (or help) on the fix.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]