Erik Krogen created HDFS-11208:
----------------------------------

             Summary: Deadlock in WebHDFS on shutdown
                 Key: HDFS-11208
                 URL: https://issues.apache.org/jira/browse/HDFS-11208
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: webhdfs
    Affects Versions: 3.0.0-alpha1, 2.6.5, 2.7.3, 2.8.0
            Reporter: Erik Krogen
            Assignee: Erik Krogen


Currently on the client side if the {{DelegationTokenRenewer}} attempts to 
renew a WebHdfs delegation token while the client system is shutting down (i.e. 
{{FileSystem.Cache.ClientFinalizer}} is running) a deadlock may occur. This 
happens because {{ClientFinalizer}} calls {{FileSystem.Cache.closeAll()}} which 
first takes a lock on the {{FileSystem.Cache}} object and then locks each file 
system in the cache as it iterates over them. {{DelegationTokenRenewer}} takes 
a lock on a filesystem object while it is renewing that filesystem's token, but 
within {{TokenAspect.TokenManager.renew()}} (used for renewal of WebHdfs 
tokens) {{FileSystem.get}} is called, which in turn takes a lock on the 
FileSystem cache object, potentially causing deadlock if {{ClientFinalizer}} is 
currently running.

See below for example deadlock output:
{code}
Found one Java-level deadlock:
=============================
"Thread-8572":
waiting to lock monitor 0x00007eff401f9878 (object 0x000000051ec3f930, a
dali.hdfs.web.WebHdfsFileSystem),
which is held by "FileSystem-DelegationTokenRenewer"
"FileSystem-DelegationTokenRenewer":
waiting to lock monitor 0x00007f005c08f5c8 (object 0x000000050389c8b8, a
dali.fs.FileSystem$Cache),
which is held by "Thread-8572"

Java stack information for the threads listed above:
===================================================
"Thread-8572":
at dali.hdfs.web.WebHdfsFileSystem.close(WebHdfsFileSystem.java:864)

   - waiting to lock <0x000000051ec3f930> (a
   dali.hdfs.web.WebHdfsFileSystem)
   at dali.fs.FilterFileSystem.close(FilterFileSystem.java:449)
   at dali.fs.FileSystem$Cache.closeAll(FileSystem.java:2407)
   - locked <0x000000050389c8b8> (a dali.fs.FileSystem$Cache)
   at dali.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2424)
   - locked <0x000000050389c8d0> (a
   dali.fs.FileSystem$Cache$ClientFinalizer)
   at dali.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
   "FileSystem-DelegationTokenRenewer":
   at dali.fs.FileSystem$Cache.getInternal(FileSystem.java:2343)
   - waiting to lock <0x000000050389c8b8> (a dali.fs.FileSystem$Cache)
   at dali.fs.FileSystem$Cache.get(FileSystem.java:2332)
   at dali.fs.FileSystem.get(FileSystem.java:369)
   at
   dali.hdfs.web.TokenAspect$TokenManager.getInstance(TokenAspect.java:92)
   at dali.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:72)
   at dali.security.token.Token.renew(Token.java:373)
   at

   
dali.fs.DelegationTokenRenewer$RenewAction.renew(DelegationTokenRenewer.java:127)
   - locked <0x000000051ec3f930> (a dali.hdfs.web.WebHdfsFileSystem)
   at

   
dali.fs.DelegationTokenRenewer$RenewAction.access$300(DelegationTokenRenewer.java:57)
   at dali.fs.DelegationTokenRenewer.run(DelegationTokenRenewer.java:258)

Found 1 deadlock.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to