[
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256861#comment-15256861
]
Colin Patrick McCabe commented on HDFS-10323:
---------------------------------------------
Thanks for the detailed bug report, [~bpodgursky].
bq. 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child
FileSystem, and not hold onto that path itself.
This would be an incompatible change, right? It seems like a lot of code
calling {{FS#close}} might not work with this change.
bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then
all other FileSystems.
This seems like the safest way to go.
> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> ------------------------------------------------------------------------
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: federation
> Reporter: Ben Podgursky
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the
> error is, but I believe what is happening is that the ViewFileSystem’s child
> FileSystems are being close()’d before the ViewFileSystem, due to the random
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem
> tries to close(), it tries to forward the delete() calls to the appropriate
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it
> involves testing behavior on actual JVM shutdown. However, I can verify that
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);
> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;
> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
> if (fileSystem.exists(randomTemporaryDir)) {
> fileSystem.deleteOnExit(randomTemporaryDir);
> }
>
}
> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first
> glance I see two ways to fix this behavior:
> 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all
> other FileSystems.
> Would appreciate any thoughts of whether this seems accurate, and thoughts
> (or help) on the fix.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)