[jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

genericqa (JIRA) Tue, 20 Mar 2018 13:27:20 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406957#comment-16406957
 ]


genericqa commented on HDFS-10323:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-10323 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-10323 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896812/HDFS-10323.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23567/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> ------------------------------------------------------------------------
>
>                 Key: HDFS-10323
>                 URL: https://issues.apache.org/jira/browse/HDFS-10323
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: federation
>    Affects Versions: 2.6.0, 2.7.4, 3.0.0-beta1
>            Reporter: Ben Podgursky
>            Assignee: Wenxin He
>            Priority: Major
>         Attachments: HDFS-10323.001.patch, HDFS-10323.002.patch, 
> HDFS-10323.003.patch
>
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing behavior on actual JVM shutdown.  However, I can verify that 
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir); 
> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this 
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1; 
> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {   
>   if (fileSystem.exists(randomTemporaryDir)) {     
>     fileSystem.deleteOnExit(randomTemporaryDir);   
>   }
>  } 
> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first 
> glance I see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
> other FileSystems.  
> Would appreciate any thoughts of whether this seems accurate, and thoughts 
> (or help) on the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

Reply via email to