[
https://issues.apache.org/jira/browse/HDFS-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452935#comment-15452935
]
Manoj Govindassamy commented on HDFS-10819:
-------------------------------------------
Here is the log snippet from a failed test which is totally different from the
signatures in HDFS-10780
{noformat}
1148 2016-08-25 18:21:19,853 [ReplicationMonitor] WARN net.NetworkTopology
(NetworkTopology.java:chooseRandom(816)) - Failed to find datanode (scope=""
excludedScope="/default-rack").
1149 2016-08-25 18:21:19,853 [ReplicationMonitor] WARN
blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(402)) - Failed to place enough
replicas, still in need of 1 to reach 3 (unavailableStorages=[],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more
information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
1150 2016-08-25 18:21:19,854 [ReplicationMonitor] WARN net.NetworkTopology
(NetworkTopology.java:chooseRandom(816)) - Failed to find datanode (scope=""
excludedScope="/default-rack").
1151 2016-08-25 18:21:19,854 [ReplicationMonitor] WARN
blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(402)) - Failed to place enough
replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more
information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
1152 2016-08-25 18:21:19,854 [ReplicationMonitor] WARN
protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) -
Failed to place enough replicas: expected size is 1 but only 0 storage types
can be selected (replication=3, selected=[], unavailable=[DISK, ARCHIVE],
removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
1153 2016-08-25 18:21:19,854 [ReplicationMonitor] WARN
blockmanagement.BlockPlacementPolicy
(BlockPlacementPolicyDefault.java:chooseTarget(402)) - Failed to place enough
replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK, ARCHIVE],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All
required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE],
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK],
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
1154 2016-08-25 18:21:19,854 [ReplicationMonitor] DEBUG BlockStateChange
(BlockManager.java:computeReconstructionWorkForBlocks(1680)) - BLOCK*
neededReconstruction = 1 pendingReconstruction = 0
1155 2016-08-25 18:21:20,071 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
heartbeating to localhost/127.0.0.1:63869] DEBUG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 1 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid a8be1f44-ba6d-4a90-a380-ccab72e69aeb) service to
localhost/127.0.0.1:63869
1156 2016-08-25 18:21:20,684 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data4/]]
heartbeating to localhost/127.0.0.1:63869] DEBUG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid ce51d074-8231-4ca6-8dff-21de2281910b) service to
localhost/127.0.0.1:63869
1157 2016-08-25 18:21:20,684 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data6/]]
heartbeating to localhost/127.0.0.1:63869] DEBUG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid 31f5b7d6-6d2e-4410-8ca8-27bfe5b768fc) service to
localhost/127.0.0.1:63869
1158 2016-08-25 18:21:20,688 [IPC Server handler 0 on 63869] INFO
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true
ugi=manoj (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/test
dst=null perm=null proto=rpc
1159 2016-08-25 18:21:20,690 [IPC Server handler 9 on 63869] DEBUG
blockmanagement.BlockManager (BlockManager.java:createLocatedBlocks(1109)) -
blocks = [blk_1073741825_1002]
1160 2016-08-25 18:21:20,690 [IPC Server handler 9 on 63869] INFO
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true
ugi=manoj (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/test
dst=null perm=null proto=rpc
1161 Block 0 of file /test has replication factor 2 (desired 3); locations
127.0.0.1:63870 127.0.0.1:63874
1162 2016-08-25 18:21:21,072 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
heartbeating to localhost/127.0.0.1:63869] DEB UG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 1 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid a8be1f44-ba6d-4a90-a380-ccab72e69aeb) service to
localhost/127.0.0.1:63869
1163 2016-08-25 18:21:21,690 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data4/]]
heartbeating to localhost/127.0.0.1:63869] DEB UG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid ce51d074-8231-4ca6-8dff-21de2281910b) service to
localhost/127.0.0.1:63869
1164 2016-08-25 18:21:21,690 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data6/]]
heartbeating to localhost/127.0.0.1:63869] DEB UG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid 31f5b7d6-6d2e-4410-8ca8-27bfe5b768fc) service to
localhost/127.0.0.1:63869
1165 2016-08-25 18:21:21,696 [IPC Server handler 3 on 63869] INFO
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true
ugi=manoj (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/test
dst=null perm=null proto=rpc
1166 2016-08-25 18:21:21,698 [IPC Server handler 2 on 63869] DEBUG
blockmanagement.BlockManager (BlockManager.java:createLocatedBlocks(1109)) -
blocks = [blk_1073741825_1002]
1167 2016-08-25 18:21:21,698 [IPC Server handler 2 on 63869] INFO
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true
ugi=manoj (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/test
dst=null perm=null proto=rpc
1168 Block 0 of file /test has replication factor 2 (desired 3); locations
127.0.0.1:63874 127.0.0.1:63870
1169 2016-08-25 18:21:22,078 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
heartbeating to localhost/127.0.0.1:63869] DEB UG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 1 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid a8be1f44-ba6d-4a90-a380-ccab72e69aeb) service to
localhost/127.0.0.1:63869
1170 2016-08-25 18:21:22,691 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data6/]]
heartbeating to localhost/127.0.0.1:63869] DEB UG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid 31f5b7d6-6d2e-4410-8ca8-27bfe5b768fc) service to
localhost/127.0.0.1:63869
1171 2016-08-25 18:21:22,691 [DataNode:
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data4/]]
heartbeating to localhost/127.0.0.1:63869] DEB UG datanode.DataNode
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128
(Datanode Uuid ce51d074-8231-4ca6-8dff-21de2281910b) service to
localhost/127.0.0.1:63869
1172 2016-08-25 18:21:22,703 [IPC Server handler 4 on 63869] INFO
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true
ugi=manoj (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/test
dst=null perm=null proto=rpc
1173 2016-08-25 18:21:22,704 [IPC Server handler 5 on 63869] DEBUG
blockmanagement.BlockManager (BlockManager.java:createLocatedBlocks(1109)) -
blocks = [blk_1073741825_1002]
1174 2016-08-25 18:21:22,704 [IPC Server handler 5 on 63869] INFO
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true
ugi=manoj (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/test
dst=null perm=null proto=rpc
1175 Block 0 of file /test has replication factor 2 (desired 3); locations
127.0.0.1:63870 127.0.0.1:63874
{noformat}
> BlockManager fails to store a good block for a datanode storage after it
> reported a corrupt block — block replication stuck
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10819
> URL: https://issues.apache.org/jira/browse/HDFS-10819
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 3.0.0-alpha1
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test
> testRemoveVolumeBeingWrittenForDatanode. Data write pipeline can have issues
> as there could be timeouts, data node not reachable etc, and in this test
> case it was more of induced one as one of the volumes in a datanode is
> removed while block write is in progress. Digging further in the logs, when
> the problem happens in the write pipeline, the error recovery is not
> happening as expected leading to block replication never catching up.
> Though this problem has same signature as in HDFS-10780, from the logs it
> looks like the code paths taken are totally different and so the root cause
> could be different as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]