[
https://issues.apache.org/jira/browse/HDFS-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ade updated HDFS-10605:
-----------------------
Description:
The UT: TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten can be ran
successful, but deadlock like HDFS-9874 maybe happen online.
* UT:
{code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
dn.data = Mockito.spy(data);
LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" +
dn.data.hashCode());
doAnswer(new Answer<Object>() {
public Object answer(InvocationOnMock invocation)
throws IOException, InterruptedException {
Thread.sleep(1000);
// Bypass the argument to FsDatasetImpl#finalizeBlock to verify that
// the block is not removed, since the volume reference should not
// be released at this point.
data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
return null;
}
}).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
{code}
Two thread can run synchronized method dn.data.removeVolumes and
data.finalizeBlock concurrently because dn.data(mocked) and data is not the
same object(hash 1903955157 and 1508483764).
{noformat}
2016-07-11 16:16:07,788 INFO [Thread-0] datanode.TestDataNodeHotSwapVolumes
(TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599))
- data hash:1903955157; dn.data hash:1508483764
2016-07-11 16:16:07,801 INFO [Thread-157] datanode.DataNode
(DataNode.java:reconfigurePropertyImpl(456)) - Reconfiguring
dfs.datanode.data.dir to
[DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
2016-07-11 16:16:07,810 WARN [Thread-157] common.Util
(Util.java:stringAsURI(56)) - Path
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
should be specified as a URI in configuration files. Please update hdfs
configuration.
2016-07-11 16:16:07,811 INFO [Thread-157] datanode.DataNode
(DataNode.java:removeVolumes(674)) - Deactivating volumes (clear failure=true):
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
2016-07-11 16:16:07,836 INFO [Thread-157] impl.FsDatasetImpl
(FsDatasetImpl.java:removeVolumes(459)) - Removing
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
from FsDataset.
2016-07-11 16:16:07,836 INFO [Thread-157] impl.FsDatasetImpl
(FsDatasetImpl.java:removeVolumes(463)) - removeVolumes of object
hash:1508483764
2016-07-11 16:16:07,836 INFO [Thread-157] datanode.BlockScanner
(BlockScanner.java:removeVolumeScanner(243)) - Removing scanner for volume
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
(StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
2016-07-11 16:16:07,836 INFO
[VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
datanode.VolumeScanner (VolumeScanner.java:run(630)) -
VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
2016-07-11 16:16:07,891 INFO [IPC Server handler 7 on 63546]
blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage
[DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548 from
DataNode127.0.0.1:63548
2016-07-11 16:16:07,908 INFO [IPC Server handler 9 on 63546]
blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(866))
- Adding new storage ID DS-f4df3404-9f02-470e-b202-75f5a4de29cb for DN
127.0.0.1:63548
2016-07-11 16:16:08,845 INFO [PacketResponder:
BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001,
type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl
(FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of object
hash:1903955157
2016-07-11 16:16:12,933 INFO [DataXceiver for client at /127.0.0.1:63574
[Receiving block BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]]
impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of
object hash:1903955157
{noformat}
The UT ran passed.
* Online
When dn.data.removeVolumes the thread run in FsVolumeImpl.closeAndWait() with
dn.data lock and wait referenceCount() = 0, but the other DataXceiver thread
maybe blocked by dn.data lock and with referencing volume. This can be happened
like HDFS-9874.
was:
The UT: TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten can be ran
successful, but deadlock like HDFS-9874 maybe happen online.
* UT:
{code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
dn.data = Mockito.spy(data);
LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" +
dn.data.hashCode());
doAnswer(new Answer<Object>() {
public Object answer(InvocationOnMock invocation)
throws IOException, InterruptedException {
Thread.sleep(1000);
// Bypass the argument to FsDatasetImpl#finalizeBlock to verify that
// the block is not removed, since the volume reference should not
// be released at this point.
data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
return null;
}
}).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
{code}
Two thread can run synchronized method dn.data.removeVolumes and
data.finalizeBlock concurrently because dn.data(mocked) and data is not the
same object(hash 1903955157 and 1508483764).
{noformat}
2016-07-11 16:16:07,788 INFO [Thread-0] datanode.TestDataNodeHotSwapVolumes
(TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599))
- data hash:1903955157; dn.data hash:1508483764
2016-07-11 16:16:07,801 INFO [Thread-157] datanode.DataNode
(DataNode.java:reconfigurePropertyImpl(456)) - Reconfiguring
dfs.datanode.data.dir to
[DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
2016-07-11 16:16:07,810 WARN [Thread-157] common.Util
(Util.java:stringAsURI(56)) - Path
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
should be specified as a URI in configuration files. Please update hdfs
configuration.
2016-07-11 16:16:07,811 INFO [Thread-157] datanode.DataNode
(DataNode.java:removeVolumes(674)) - Deactivating volumes (clear failure=true):
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
2016-07-11 16:16:07,836 INFO [Thread-157] impl.FsDatasetImpl
(FsDatasetImpl.java:removeVolumes(459)) - Removing
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
from FsDataset.
2016-07-11 16:16:07,836 INFO [Thread-157] impl.FsDatasetImpl
(FsDatasetImpl.java:removeVolumes(463)) - removeVolumes of object
hash:1508483764
2016-07-11 16:16:07,836 INFO [Thread-157] datanode.BlockScanner
(BlockScanner.java:removeVolumeScanner(243)) - Removing scanner for volume
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
(StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
2016-07-11 16:16:07,836 INFO
[VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
datanode.VolumeScanner (VolumeScanner.java:run(630)) -
VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
2016-07-11 16:16:07,891 INFO [IPC Server handler 7 on 63546]
blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage
[DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548 from
DataNode127.0.0.1:63548
2016-07-11 16:16:07,908 INFO [IPC Server handler 9 on 63546]
blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(866))
- Adding new storage ID DS-f4df3404-9f02-470e-b202-75f5a4de29cb for DN
127.0.0.1:63548
2016-07-11 16:16:08,845 INFO [PacketResponder:
BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001,
type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl
(FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of object
hash:1903955157
2016-07-11 16:16:12,933 INFO [DataXceiver for client at /127.0.0.1:63574
[Receiving block BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]]
impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of
object hash:1903955157
{noformat}
The UT ran passed.
* Online
When dn.data.removeVolumes the thread run in FsVolumeImpl.closeAndWait() with
dn.data lock and wait referenceCount() = 0, but the other DataXceiver thread
maybe blocked by dn.data lock and with referencing volume. This can be happened
like HDFS-9874.
* Potential issue in project
There are so many usage of Mockito.spy in project, could some issues like that
the UT passed but deadlock online ?
> Can not synchronized call method of object and Mockito.spy(object), So
> UT:testRemoveVolumeBeingWritten passed but maybe deadlock online
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10605
> URL: https://issues.apache.org/jira/browse/HDFS-10605
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2
> Reporter: ade
> Labels: test
>
> The UT: TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten can be ran
> successful, but deadlock like HDFS-9874 maybe happen online.
> * UT:
> {code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
> final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
> dn.data = Mockito.spy(data);
> LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" +
> dn.data.hashCode());
> doAnswer(new Answer<Object>() {
> public Object answer(InvocationOnMock invocation)
> throws IOException, InterruptedException {
> Thread.sleep(1000);
> // Bypass the argument to FsDatasetImpl#finalizeBlock to verify
> that
> // the block is not removed, since the volume reference should not
> // be released at this point.
> data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
> return null;
> }
> }).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
> {code}
> Two thread can run synchronized method dn.data.removeVolumes and
> data.finalizeBlock concurrently because dn.data(mocked) and data is not the
> same object(hash 1903955157 and 1508483764).
> {noformat}
> 2016-07-11 16:16:07,788 INFO [Thread-0] datanode.TestDataNodeHotSwapVolumes
> (TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599))
> - data hash:1903955157; dn.data hash:1508483764
> 2016-07-11 16:16:07,801 INFO [Thread-157] datanode.DataNode
> (DataNode.java:reconfigurePropertyImpl(456)) - Reconfiguring
> dfs.datanode.data.dir to
> [DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
> 2016-07-11 16:16:07,810 WARN [Thread-157] common.Util
> (Util.java:stringAsURI(56)) - Path
> /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
> should be specified as a URI in configuration files. Please update hdfs
> configuration.
> 2016-07-11 16:16:07,811 INFO [Thread-157] datanode.DataNode
> (DataNode.java:removeVolumes(674)) - Deactivating volumes (clear
> failure=true):
> /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
> 2016-07-11 16:16:07,836 INFO [Thread-157] impl.FsDatasetImpl
> (FsDatasetImpl.java:removeVolumes(459)) - Removing
> /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
> from FsDataset.
> 2016-07-11 16:16:07,836 INFO [Thread-157] impl.FsDatasetImpl
> (FsDatasetImpl.java:removeVolumes(463)) - removeVolumes of object
> hash:1508483764
> 2016-07-11 16:16:07,836 INFO [Thread-157] datanode.BlockScanner
> (BlockScanner.java:removeVolumeScanner(243)) - Removing scanner for volume
> /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
> (StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
> 2016-07-11 16:16:07,836 INFO
> [VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
> datanode.VolumeScanner (VolumeScanner.java:run(630)) -
> VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
> DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
> 2016-07-11 16:16:07,891 INFO [IPC Server handler 7 on 63546]
> blockmanagement.DatanodeDescriptor
> (DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage
> [DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548 from
> DataNode127.0.0.1:63548
> 2016-07-11 16:16:07,908 INFO [IPC Server handler 9 on 63546]
> blockmanagement.DatanodeDescriptor
> (DatanodeDescriptor.java:updateStorage(866)) - Adding new storage ID
> DS-f4df3404-9f02-470e-b202-75f5a4de29cb for DN 127.0.0.1:63548
> 2016-07-11 16:16:08,845 INFO [PacketResponder:
> BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001,
> type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl
> (FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock of object
> hash:1903955157
> 2016-07-11 16:16:12,933 INFO [DataXceiver for client at /127.0.0.1:63574
> [Receiving block BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]]
> impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559)) - finalizeBlock
> of object hash:1903955157
> {noformat}
> The UT ran passed.
> * Online
> When dn.data.removeVolumes the thread run in FsVolumeImpl.closeAndWait() with
> dn.data lock and wait referenceCount() = 0, but the other DataXceiver thread
> maybe blocked by dn.data lock and with referencing volume. This can be
> happened like HDFS-9874.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]