[
https://issues.apache.org/jira/browse/HDFS-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447422#comment-15447422
]
Andrew Wang commented on HDFS-7489:
-----------------------------------
FYI for git log greppers, this commit is missing the JIRA ID.
> Incorrect locking in FsVolumeList#checkDirs can hang datanodes
> --------------------------------------------------------------
>
> Key: HDFS-7489
> URL: https://issues.apache.org/jira/browse/HDFS-7489
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.5.0, 2.6.0
> Reporter: Noah Lorang
> Assignee: Noah Lorang
> Priority: Critical
> Labels: 2.6.1-candidate
> Fix For: 2.6.1
>
> Attachments: HDFS-7489-v1.patch, HDFS-7489-v2.patch,
> HDFS-7489-v2.patch.1
>
>
> Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes
> hanging their heartbeat and requests from clients. After some digging, I
> identified the culprit as being the checkDiskError() triggered by catching
> IOExceptions (in our case, SocketExceptions being triggered on one datanode
> by ReplicaAlreadyExistsExceptions on another datanode).
> Thread dumps reveal that the checkDiskErrors() thread is holding a lock on
> the FsVolumeList:
> {code}
> "Thread-409" daemon prio=10 tid=0x00007f4e50200800 nid=0x5b8e runnable
> [0x00007f4e2f855000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.list(Native Method)
> at java.io.File.list(File.java:973)
> at java.io.File.listFiles(File.java:1051)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
> - locked <0x000000063b182ea0> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> Other things would then lock the FsDatasetImpl while waiting for the
> FsVolumeList, e.g.:
> {code}
> "DataXceiver for client at /10.10.0.52:46643 [Receiving block
> BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_1099996962574]" daemon
> prio=10 tid=0x00007f4e55561000 nid=0x406d waiting for monitor entry
> [0x00007f4e3106d000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
> - waiting to lock <0x000000063b182ea0> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
> - locked <0x000000063b1f9a48> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:167)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> That lock on the FsDatasetImpl then causes other threads to block:
> {code}
> "Thread-127" daemon prio=10 tid=0x00007f4e4c67d800 nid=0x2e02 waiting for
> monitor entry [0x00007f4e33390000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:228)
> - waiting to lock <0x000000063b1f9a48> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:650)
> at
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101)
> "DataNode: [[[DISK]file:/data/0/dfs/dn/, [DISK]file:/data/1/dfs/dn/,
> [DISK]file:/data/2/dfs/dn/, [DISK]file:/data/3/dfs/dn/,
> [DISK]file:/data/4/dfs/dn/, [DISK]file:/data/5/dfs/dn/,
> [DISK]file:/data/6/dfs/dn/, [DISK]file:/data/7/dfs/dn/,
> [DISK]file:/data/8/dfs/dn/, [DISK]file:/data/9/dfs/dn/,
> [DISK]file:/data/10/dfs/dn/, [DISK]file:/data/11/dfs/dn/,
> [DISK]file:/data/12/dfs/dn/, [DISK]file:/data/13/dfs/dn/,
> [DISK]file:/data/14/dfs/dn/, [DISK]file:/data/15/dfs/dn/,
> [DISK]file:/data/16/dfs/dn/, [DISK]file:/data/17/dfs/dn/,
> [DISK]file:/data/18/dfs/dn/, [DISK]file:/data/19/dfs/dn/,
> [DISK]file:/data/20/dfs/dn/, [DISK]file:/data/21/dfs/dn/,
> [DISK]file:/data/22/dfs/dn/, [DISK]file:/data/23/dfs/dn/,
> [DISK]file:/data/24/dfs/dn/, [DISK]file:/data/25/dfs/dn/,
> [DISK]file:/data/26/dfs/dn/, [DISK]file:/data/27/dfs/dn/,
> [DISK]file:/data/28/dfs/dn/, [DISK]file:/data/29/dfs/dn/,
> [DISK]file:/data/30/dfs/dn/, [DISK]file:/data/31/dfs/dn/,
> [DISK]file:/data/32/dfs/dn/, [DISK]file:/data/33/dfs/dn/,
> [DISK]file:/data/34/dfs/dn/, [DISK]file:/data/35/dfs/dn/]] heartbeating to
> bigdata-01.sc-chi-int.37signals.com/10.10.0.211:8020" daemon prio=10
> tid=0x00007f4e553a5800 nid=0x2d66 waiting for monitor entry
> [0x00007f4e361be000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:116)
> - waiting to lock <0x000000063b1f9a48> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:132)
> - locked <0x000000063b182d80> (a java.lang.Object)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:572)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
> at java.lang.Thread.run(Thread.java:662)
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1948416574_103 at
> /10.10.0.169:44229 [Receiving block
> BP-1573746465-127.0.1.1-1352244533715:blk_1073776794_1099996963072]" daemon
> prio=10 tid=0x00007f4e55431000 nid=0x4ab8 waiting for monitor entry
> [0x00007f4e2a448000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:782)
> - waiting to lock <0x000000063b1f9a48> (a
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:101)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:171)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> This occurs, I believe, because FsVolumeList#checkDirs() locks the entire
> FsVolumeList to prevent multiple simultaneous checkDirs() calls. On a slow
> system under high IO or with many disks, the checkDirs() call can take dozens
> of seconds or longer (empirically).
> I think this can be improved by holding a separate mutex limited to checkDirs
> (must like datanode.checkDiskErrors() does) and only locking the full
> FsVolumeList if needed to update the volume list if any disks are removed.
> I'll attach a patch that does this.
> We're running this patch in production and it's working as expected -- a lock
> is held locally instead of on the entire FsVolumeList while checkDirs() is
> running:
> {code}
> "Thread-614" daemon prio=10 tid=0x000000004037b000 nid=0x7331 runnable
> [0x00007f4d45391000]
> java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1157)
> at
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:88)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:182)
> - locked <0x000000063b24f540> (a java.lang.Object)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> Feedback would be appreciated!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]