[ 
https://issues.apache.org/jira/browse/HDFS-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Chen updated HDFS-16578:
-----------------------------
    Description: 
There is no missing blocks in NN1, after NN2 has transitioned to active state 
by stopping zkfc of NN1, we have found some missing blocks in NN2 webui, and we 
test the corrupted file, "hadoop fs -get /user/xxx/d=2020-01-03/000154_0.lzo 
.", it is not readable.
{panel:title=Exception}
get: Could not obtain block: 
BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
file=/user/xxx/d=2020-01-03/000154_0.lzo 
{panel}
when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is healthy.
{panel:title=Fscklog}
/user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s):  OK
0. BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
len=1555552 {color:#172b4d}Live_repl=2{color} 
DatanodeInfoWithStorage[{color:#4c9aff}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
 
[DatanodeInfoWithStorage[{color:#4c9aff}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]

Status: HEALTHY
 Total size:    1555552 B
 Total dirs:    0
 Total files:    1
 Total symlinks:        0
 Total blocks (validated):    1 (avg. block size 1555552 B)
 Minimally replicated blocks:    1 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.0
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        2400
 Number of racks:        90
FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
{panel}
then we we check the {color:#172b4d}blk_1081077638_7337053{color} in datanodes 
log
{panel:title=datanode0}
2021-11-10 16:08:10,441 [1728658213] - INFO [BP-459146894-xxx-1581848181424 
heartbeating to NN1/xxx:8021:DataNode@2255] - 
DatanodeRegistration({color:#4c9aff}datanode0{color}:1004, 
datanodeUuid=8a0d2e92-1c7c-4e32-8ce2-390b524d7ced, infoPort=1006, 
infoSecurePort=0, ipcPort=8025, 
storageInfo=lv=-57;cid=CID-01d25bd0-acba-47c4-9273-f2457a370f8b;nsid=1756420227;c=1581848181424)
 Starting thread to transfer 
BP-459146894-xxx-1581848181424:blk_1081077638_7337053 to 
{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:08:10,468 [1728658240] - INFO 
[org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@332bbd5d:DataNode$DataTransfer@2464]
 - DataTransfer, at {color:#4c9aff}datanode0{color}:1004: Transmitted 
BP-459146894-xxx-1581848181424:blk_1081077638_7337053 (numBytes=1555552) to 
/{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:09:46,445 [1728754217] - INFO [BP-459146894-xxx-1581848181424 
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
blk_1081077638_7337053 file 
/mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
 for deletion
2021-11-10 16:09:46,453 [1728754225] - INFO [Async disk worker #6294 for volume 
/mnt/dfs/2/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] - 
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
/mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode1}
2021-11-10 16:08:10,453 [165787777] - INFO  [DataXceiver for client  at 
/datanode0:54958 [Receiving block 
BP-459146894-10.196.80.35-1581848181424:blk_1081077638_7337053]:DataXceiver@717]
 - Receiving BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src: 
/{color:#4c9aff}datanode0{color}:54958 dest: 
/{color:#4c9aff}datanode1{color}:1004

2021-11-10 16:08:10,480 [165787804] - INFO  [DataXceiver for client  at 
/{color:#4c9aff}datanode0{color}:54958 [Receiving block 
BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@892] - 
Received BP-459146894-10.196.80.35-1581848181424:blk_1081077638_7337053 src: 
/{color:#4c9aff}datanode0{color}:54958 dest: 
/{color:#4c9aff}datanode1{color}:1004 of size 1555552

2022-05-10 12:00:42,984 [12699841344] - INFO  [BP-459146894-xxx-1581848181424 
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
{color:#172b4d}blk_1081077638_7337053{color} file 
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
 for deletion
2022-05-10 12:00:44,409 [12699842769] - INFO  [Async disk worker #46179 for 
volume 
/mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] - 
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode2}
2021-11-29 16:27:07,411 [2765933340] - INFO  [BP-459146894-xxx-1581848181424 
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
{color:#172b4d}blk_1081077638_7337053{color} file 
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
 for deletion
2021-11-29 16:27:08,587 [2765934516] - INFO  [Async disk worker #10145 for 
volume 
/mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] - 
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}

  was:
There is no missing blocks in NN1, after NN2 has transitioned to active state 
by stopping zkfc of NN1, we have found some missing blocks in NN2 webui.
we test the corrupted file, "hadoop fs -get /user/xxx/d=2020-01-03/000154_0.lzo 
.", it is not readable.
{panel:title=Exception}
get: Could not obtain block: 
BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
file=/user/xxx/d=2020-01-03/000154_0.lzo 
{panel}
when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is healthy.
{panel:title=Fscklog}
/user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s):  OK
0. BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
len=1555552 {color:#172b4d}Live_repl=2{color} 
DatanodeInfoWithStorage[{color:#4c9aff}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
 
[DatanodeInfoWithStorage[{color:#4c9aff}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]

Status: HEALTHY
 Total size:    1555552 B
 Total dirs:    0
 Total files:    1
 Total symlinks:        0
 Total blocks (validated):    1 (avg. block size 1555552 B)
 Minimally replicated blocks:    1 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.0
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        2400
 Number of racks:        90
FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
{panel}
then we we check the {color:#172b4d}blk_1081077638_7337053{color} in datanodes 
log
{panel:title=datanode0}
2021-11-10 16:08:10,441 [1728658213] - INFO [BP-459146894-xxx-1581848181424 
heartbeating to NN1/xxx:8021:DataNode@2255] - 
DatanodeRegistration({color:#4c9aff}datanode0{color}:1004, 
datanodeUuid=8a0d2e92-1c7c-4e32-8ce2-390b524d7ced, infoPort=1006, 
infoSecurePort=0, ipcPort=8025, 
storageInfo=lv=-57;cid=CID-01d25bd0-acba-47c4-9273-f2457a370f8b;nsid=1756420227;c=1581848181424)
 Starting thread to transfer 
BP-459146894-xxx-1581848181424:blk_1081077638_7337053 to 
{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:08:10,468 [1728658240] - INFO 
[org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@332bbd5d:DataNode$DataTransfer@2464]
 - DataTransfer, at {color:#4c9aff}datanode0{color}:1004: Transmitted 
BP-459146894-xxx-1581848181424:blk_1081077638_7337053 (numBytes=1555552) to 
/{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:09:46,445 [1728754217] - INFO [BP-459146894-xxx-1581848181424 
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
blk_1081077638_7337053 file 
/mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
 for deletion
2021-11-10 16:09:46,453 [1728754225] - INFO [Async disk worker #6294 for volume 
/mnt/dfs/2/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] - 
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
/mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode1}
2021-11-10 16:08:10,453 [165787777] - INFO  [DataXceiver for client  at 
/datanode0:54958 [Receiving block 
BP-459146894-10.196.80.35-1581848181424:blk_1081077638_7337053]:DataXceiver@717]
 - Receiving BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src: 
/{color:#4c9aff}datanode0{color}:54958 dest: 
/{color:#4c9aff}datanode1{color}:1004

2021-11-10 16:08:10,480 [165787804] - INFO  [DataXceiver for client  at 
/{color:#4c9aff}datanode0{color}:54958 [Receiving block 
BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@892] - 
Received BP-459146894-10.196.80.35-1581848181424:blk_1081077638_7337053 src: 
/{color:#4c9aff}datanode0{color}:54958 dest: 
/{color:#4c9aff}datanode1{color}:1004 of size 1555552

2022-05-10 12:00:42,984 [12699841344] - INFO  [BP-459146894-xxx-1581848181424 
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
{color:#172b4d}blk_1081077638_7337053{color} file 
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
 for deletion
2022-05-10 12:00:44,409 [12699842769] - INFO  [Async disk worker #46179 for 
volume 
/mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] - 
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode2}
2021-11-29 16:27:07,411 [2765933340] - INFO  [BP-459146894-xxx-1581848181424 
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
{color:#172b4d}blk_1081077638_7337053{color} file 
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
 for deletion
2021-11-29 16:27:08,587 [2765934516] - INFO  [Async disk worker #10145 for 
volume 
/mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] - 
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}


> Missing blocks appeared after snn has transitioned to active state 
> -------------------------------------------------------------------
>
>                 Key: HDFS-16578
>                 URL: https://issues.apache.org/jira/browse/HDFS-16578
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.9.2
>            Reporter: Hong Chen
>            Priority: Critical
>
> There is no missing blocks in NN1, after NN2 has transitioned to active state 
> by stopping zkfc of NN1, we have found some missing blocks in NN2 webui, and 
> we test the corrupted file, "hadoop fs -get 
> /user/xxx/d=2020-01-03/000154_0.lzo .", it is not readable.
> {panel:title=Exception}
> get: Could not obtain block: 
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
> file=/user/xxx/d=2020-01-03/000154_0.lzo 
> {panel}
> when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is 
> healthy.
> {panel:title=Fscklog}
> /user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s):  OK
> 0. 
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
> len=1555552 {color:#172b4d}Live_repl=2{color} 
> DatanodeInfoWithStorage[{color:#4c9aff}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
>  
> [DatanodeInfoWithStorage[{color:#4c9aff}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
> Status: HEALTHY
>  Total size:    1555552 B
>  Total dirs:    0
>  Total files:    1
>  Total symlinks:        0
>  Total blocks (validated):    1 (avg. block size 1555552 B)
>  Minimally replicated blocks:    1 (100.0 %)
>  Over-replicated blocks:    0 (0.0 %)
>  Under-replicated blocks:    0 (0.0 %)
>  Mis-replicated blocks:        0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:    2.0
>  Corrupt blocks:        0
>  Missing replicas:        0 (0.0 %)
>  Number of data-nodes:        2400
>  Number of racks:        90
> FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
> {panel}
> then we we check the {color:#172b4d}blk_1081077638_7337053{color} in 
> datanodes log
> {panel:title=datanode0}
> 2021-11-10 16:08:10,441 [1728658213] - INFO [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:DataNode@2255] - 
> DatanodeRegistration({color:#4c9aff}datanode0{color}:1004, 
> datanodeUuid=8a0d2e92-1c7c-4e32-8ce2-390b524d7ced, infoPort=1006, 
> infoSecurePort=0, ipcPort=8025, 
> storageInfo=lv=-57;cid=CID-01d25bd0-acba-47c4-9273-f2457a370f8b;nsid=1756420227;c=1581848181424)
>  Starting thread to transfer 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053 to 
> {color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:08:10,468 [1728658240] - INFO 
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@332bbd5d:DataNode$DataTransfer@2464]
>  - DataTransfer, at {color:#4c9aff}datanode0{color}:1004: Transmitted 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053 (numBytes=1555552) to 
> /{color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:09:46,445 [1728754217] - INFO [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
> blk_1081077638_7337053 file 
> /mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
>  for deletion
> 2021-11-10 16:09:46,453 [1728754225] - INFO [Async disk worker #6294 for 
> volume 
> /mnt/dfs/2/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] 
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
> /mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
> {panel:title=datanode1}
> 2021-11-10 16:08:10,453 [165787777] - INFO  [DataXceiver for client  at 
> /datanode0:54958 [Receiving block 
> BP-459146894-10.196.80.35-1581848181424:blk_1081077638_7337053]:DataXceiver@717]
>  - Receiving BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src: 
> /{color:#4c9aff}datanode0{color}:54958 dest: 
> /{color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:08:10,480 [165787804] - INFO  [DataXceiver for client  at 
> /{color:#4c9aff}datanode0{color}:54958 [Receiving block 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@892] - 
> Received BP-459146894-10.196.80.35-1581848181424:blk_1081077638_7337053 src: 
> /{color:#4c9aff}datanode0{color}:54958 dest: 
> /{color:#4c9aff}datanode1{color}:1004 of size 1555552
> 2022-05-10 12:00:42,984 [12699841344] - INFO  [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
> {color:#172b4d}blk_1081077638_7337053{color} file 
> /mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
>  for deletion
> 2022-05-10 12:00:44,409 [12699842769] - INFO  [Async disk worker #46179 for 
> volume 
> /mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] 
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
> /mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
> {panel:title=datanode2}
> 2021-11-29 16:27:07,411 [2765933340] - INFO  [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
> {color:#172b4d}blk_1081077638_7337053{color} file 
> /mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
>  for deletion
> 2021-11-29 16:27:08,587 [2765934516] - INFO  [Async disk worker #10145 for 
> volume 
> /mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] 
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
> /mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to