[ 
https://issues.apache.org/jira/browse/HDFS-10819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15452935#comment-15452935
 ] 

Manoj Govindassamy commented on HDFS-10819:
-------------------------------------------


Here is the log snippet from a failed test which is totally different from the 
signatures in HDFS-10780

{noformat}

1148 2016-08-25 18:21:19,853 [ReplicationMonitor] WARN  net.NetworkTopology 
(NetworkTopology.java:chooseRandom(816)) - Failed to find datanode (scope="" 
excludedScope="/default-rack").
1149 2016-08-25 18:21:19,853 [ReplicationMonitor] WARN  
blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseTarget(402)) - Failed to place enough 
replicas, still in need of 1 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
1150 2016-08-25 18:21:19,854 [ReplicationMonitor] WARN  net.NetworkTopology 
(NetworkTopology.java:chooseRandom(816)) - Failed to find datanode (scope="" 
excludedScope="/default-rack").
1151 2016-08-25 18:21:19,854 [ReplicationMonitor] WARN  
blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseTarget(402)) - Failed to place enough 
replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
1152 2016-08-25 18:21:19,854 [ReplicationMonitor] WARN  
protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) - 
Failed to place enough replicas: expected size is 1 but only 0 storage types 
can be selected (replication=3, selected=[], unavailable=[DISK, ARCHIVE], 
removed=[DISK],      policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
1153 2016-08-25 18:21:19,854 [ReplicationMonitor] WARN  
blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseTarget(402)) - Failed to place enough 
replicas, still in need of 1 to reach 3 (unavailableStorages=[DISK, ARCHIVE], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All 
required storage types are unavailable:  unavailableStorages=[DISK, ARCHIVE], 
storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
1154 2016-08-25 18:21:19,854 [ReplicationMonitor] DEBUG BlockStateChange 
(BlockManager.java:computeReconstructionWorkForBlocks(1680)) - BLOCK* 
neededReconstruction = 1 pendingReconstruction = 0
1155 2016-08-25 18:21:20,071 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
  heartbeating to localhost/127.0.0.1:63869] DEBUG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 1 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid a8be1f44-ba6d-4a90-a380-ccab72e69aeb) service to 
localhost/127.0.0.1:63869
1156 2016-08-25 18:21:20,684 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data4/]]
  heartbeating to localhost/127.0.0.1:63869] DEBUG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid ce51d074-8231-4ca6-8dff-21de2281910b) service to 
localhost/127.0.0.1:63869
1157 2016-08-25 18:21:20,684 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data6/]]
  heartbeating to localhost/127.0.0.1:63869] DEBUG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid 31f5b7d6-6d2e-4410-8ca8-27bfe5b768fc) service to 
localhost/127.0.0.1:63869
1158 2016-08-25 18:21:20,688 [IPC Server handler 0 on 63869] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true     
  ugi=manoj (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/test       
dst=null        perm=null       proto=rpc
1159 2016-08-25 18:21:20,690 [IPC Server handler 9 on 63869] DEBUG 
blockmanagement.BlockManager (BlockManager.java:createLocatedBlocks(1109)) - 
blocks = [blk_1073741825_1002]
1160 2016-08-25 18:21:20,690 [IPC Server handler 9 on 63869] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true     
  ugi=manoj (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/test       
dst=null        perm=null       proto=rpc
1161 Block 0 of file /test has replication factor 2 (desired 3); locations 
127.0.0.1:63870 127.0.0.1:63874
1162 2016-08-25 18:21:21,072 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
  heartbeating to localhost/127.0.0.1:63869] DEB     UG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 1 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid a8be1f44-ba6d-4a90-a380-ccab72e69aeb) service to 
localhost/127.0.0.1:63869
1163 2016-08-25 18:21:21,690 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data4/]]
  heartbeating to localhost/127.0.0.1:63869] DEB     UG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid ce51d074-8231-4ca6-8dff-21de2281910b) service to 
localhost/127.0.0.1:63869
1164 2016-08-25 18:21:21,690 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data6/]]
  heartbeating to localhost/127.0.0.1:63869] DEB     UG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid 31f5b7d6-6d2e-4410-8ca8-27bfe5b768fc) service to 
localhost/127.0.0.1:63869


1165 2016-08-25 18:21:21,696 [IPC Server handler 3 on 63869] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true     
  ugi=manoj (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/test       
dst=null        perm=null       proto=rpc
1166 2016-08-25 18:21:21,698 [IPC Server handler 2 on 63869] DEBUG 
blockmanagement.BlockManager (BlockManager.java:createLocatedBlocks(1109)) - 
blocks = [blk_1073741825_1002]
1167 2016-08-25 18:21:21,698 [IPC Server handler 2 on 63869] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true     
  ugi=manoj (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/test       
dst=null        perm=null       proto=rpc
1168 Block 0 of file /test has replication factor 2 (desired 3); locations 
127.0.0.1:63874 127.0.0.1:63870


1169 2016-08-25 18:21:22,078 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
  heartbeating to localhost/127.0.0.1:63869] DEB     UG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 1 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid a8be1f44-ba6d-4a90-a380-ccab72e69aeb) service to 
localhost/127.0.0.1:63869
1170 2016-08-25 18:21:22,691 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data5/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data6/]]
  heartbeating to localhost/127.0.0.1:63869] DEB     UG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid 31f5b7d6-6d2e-4410-8ca8-27bfe5b768fc) service to 
localhost/127.0.0.1:63869
1171 2016-08-25 18:21:22,691 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/,
 
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data4/]]
  heartbeating to localhost/127.0.0.1:63869] DEB     UG datanode.DataNode 
(BPServiceActor.java:sendHeartBeat(500)) - Sending heartbeat with 2 storage 
reports from service actor: Block pool BP-1210227499-172.16.3.66-1472174461128 
(Datanode Uuid ce51d074-8231-4ca6-8dff-21de2281910b) service to 
localhost/127.0.0.1:63869
1172 2016-08-25 18:21:22,703 [IPC Server handler 4 on 63869] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true     
  ugi=manoj (auth:SIMPLE) ip=/127.0.0.1   cmd=getfileinfo src=/test       
dst=null        perm=null       proto=rpc
1173 2016-08-25 18:21:22,704 [IPC Server handler 5 on 63869] DEBUG 
blockmanagement.BlockManager (BlockManager.java:createLocatedBlocks(1109)) - 
blocks = [blk_1073741825_1002]
1174 2016-08-25 18:21:22,704 [IPC Server handler 5 on 63869] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7015)) - allowed=true     
  ugi=manoj (auth:SIMPLE) ip=/127.0.0.1   cmd=open        src=/test       
dst=null        perm=null       proto=rpc
1175 Block 0 of file /test has replication factor 2 (desired 3); locations 
127.0.0.1:63870 127.0.0.1:63874
{noformat}

> BlockManager fails to store a good block for a datanode storage after it 
> reported a corrupt block — block replication stuck
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10819
>                 URL: https://issues.apache.org/jira/browse/HDFS-10819
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test 
> testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
> as there could be timeouts, data node not reachable etc, and in this test 
> case it was more of induced one as one of the volumes in a datanode is 
> removed while block write is in progress. Digging further in the logs, when 
> the problem happens in the write pipeline, the error recovery is not 
> happening as expected leading to block replication never catching up.
> Though this problem has same signature as in HDFS-10780, from the logs it 
> looks like the code paths taken are totally different and so the root cause 
> could be different as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to