[
https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Junping Du updated HDFS-9294:
-----------------------------
Target Version/s: 2.7.2, 2.6.4 (was: 2.7.2)
> DFSClient deadlock when close file and failed to renew lease
> -------------------------------------------------------------
>
> Key: HDFS-9294
> URL: https://issues.apache.org/jira/browse/HDFS-9294
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.2.0, 2.7.1
> Environment: Hadoop 2.2.0
> Reporter: 邓飞
> Assignee: Brahma Reddy Battula
> Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: HDFS-9294-002.patch, HDFS-9294-002.patch,
> HDFS-9294-branch-2.7.patch, HDFS-9294-branch-2.patch, HDFS-9294.patch
>
>
> We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is
> 2.2.0),and it should be HDFS BUG,at the time our network is not stable.
> below is the stack:
> *************************************************************************************************************************************
> Found one Java-level deadlock:
> =============================
> "MemStoreFlusher.1":
> waiting to lock monitor 0x00007ff27cfa5218 (object 0x00000002fae5ebe0, a
> org.apache.hadoop.hdfs.LeaseRenewer),
> which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
> waiting to lock monitor 0x00007ff2e67e16a8 (object 0x0000000486ce6620, a
> org.apache.hadoop.hdfs.DFSOutputStream),
> which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
> waiting to lock monitor 0x00007ff27cfa5218 (object 0x00000002fae5ebe0, a
> org.apache.hadoop.hdfs.LeaseRenewer),
> which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> Java stack information for the threads listed above:
> ===================================================
> "MemStoreFlusher.1":
> at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
> - waiting to lock <0x00000002fae5ebe0> (a
> org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
> at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
> at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
> - locked <0x000000055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
> at
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> at
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
> - locked <0x000000059869eed8> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
> at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
> at
> org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
> - waiting to lock <0x0000000486ce6620> (a
> org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
> at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
> at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453)
> - locked <0x00000002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
> at java.lang.Thread.run(Thread.java:744)
> "MemStoreFlusher.0":
> at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
> - waiting to lock <0x00000002fae5ebe0> (a
> org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
> at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
> at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
> - locked <0x0000000486ce6620> (a org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
> at
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> at
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
> - locked <0x00000004888f6848> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
> at java.lang.Thread.run(Thread.java:744)
> Found 1 deadlock.
> **********************************************************************
> the thread "MemStoreFlusher.0" is closing outputStream and remove it's lease ;
> other side the daemon thread "LeaseRenewer" failed to connect active nn for
> renewing lease,but got SocketTimeoutException cause of network is not
> good,so abort outputstream.
> then deadlock is made.
> and it seems not solved at Hadoop 2.7.1 .If confirmed , we can fixed the
> issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)