[
https://issues.apache.org/jira/browse/HDFS-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hong Chen updated HDFS-16578:
-----------------------------
Description:
There is no missing blocks in NN1, after NN2 has transitioned to active state
by stopping zkfc of NN1, we have found some missing blocks in NN2 webui, and we
test the corrupted file, "hadoop fs -get /user/xxx/d=2020-01-03/000154_0.lzo
.", it is not readable.
{panel:title=Exception}
get: Could not obtain block:
BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
file=/user/xxx/d=2020-01-03/000154_0.lzo
{panel}
when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is healthy.
{panel:title=Fscklog}
/user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
0. BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
len=1555552 {color:#172b4d}Live_repl=2{color}
DatanodeInfoWithStorage[{color:#4c9aff}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
[DatanodeInfoWithStorage[{color:#4c9aff}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
Status: HEALTHY
Total size: 1555552 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 1555552 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2400
Number of racks: 90
FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
{panel}
then we we check the {color:#172b4d}blk_1081077638_7337053{color} in datanodes
log
{panel:title=datanode0}
2021-11-10 16:08:10,441 [1728658213] - INFO [BP-459146894-xxx-1581848181424
heartbeating to NN1/xxx:8021:DataNode@2255] -
DatanodeRegistration({color:#4c9aff}datanode0{color}:1004,
datanodeUuid=8a0d2e92-1c7c-4e32-8ce2-390b524d7ced, infoPort=1006,
infoSecurePort=0, ipcPort=8025,
storageInfo=lv=-57;cid=CID-01d25bd0-acba-47c4-9273-f2457a370f8b;nsid=1756420227;c=1581848181424)
Starting thread to transfer
BP-459146894-xxx-1581848181424:blk_1081077638_7337053 to
{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:08:10,468 [1728658240] - INFO
[org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@332bbd5d:DataNode$DataTransfer@2464]
- DataTransfer, at {color:#4c9aff}datanode0{color}:1004: Transmitted
BP-459146894-xxx-1581848181424:blk_1081077638_7337053 (numBytes=1555552) to
/{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:09:46,445 [1728754217] - INFO [BP-459146894-xxx-1581848181424
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
blk_1081077638_7337053 file
/mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2021-11-10 16:09:46,453 [1728754225] - INFO [Async disk worker #6294 for volume
/mnt/dfs/2/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode1}
2021-11-10 16:08:10,453 [165787777] - INFO [DataXceiver for client at
/datanode0:54958 [Receiving block
BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@717] -
Receiving BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src:
/{color:#4c9aff}datanode0{color}:54958 dest:
/{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:08:10,480 [165787804] - INFO [DataXceiver for client at
/{color:#4c9aff}datanode0{color}:54958 [Receiving block
BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@892] -
Received BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src:
/{color:#4c9aff}datanode0{color}:54958 dest:
/{color:#4c9aff}datanode1{color}:1004 of size 1555552
2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
volume
/mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode2}
2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
volume
/mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
was:
There is no missing blocks in NN1, after NN2 has transitioned to active state
by stopping zkfc of NN1, we have found some missing blocks in NN2 webui, and we
test the corrupted file, "hadoop fs -get /user/xxx/d=2020-01-03/000154_0.lzo
.", it is not readable.
{panel:title=Exception}
get: Could not obtain block:
BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
file=/user/xxx/d=2020-01-03/000154_0.lzo
{panel}
when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is healthy.
{panel:title=Fscklog}
/user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
0. BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
len=1555552 {color:#172b4d}Live_repl=2{color}
DatanodeInfoWithStorage[{color:#4c9aff}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
[DatanodeInfoWithStorage[{color:#4c9aff}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
Status: HEALTHY
Total size: 1555552 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 1555552 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2400
Number of racks: 90
FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
{panel}
then we we check the {color:#172b4d}blk_1081077638_7337053{color} in datanodes
log
{panel:title=datanode0}
2021-11-10 16:08:10,441 [1728658213] - INFO [BP-459146894-xxx-1581848181424
heartbeating to NN1/xxx:8021:DataNode@2255] -
DatanodeRegistration({color:#4c9aff}datanode0{color}:1004,
datanodeUuid=8a0d2e92-1c7c-4e32-8ce2-390b524d7ced, infoPort=1006,
infoSecurePort=0, ipcPort=8025,
storageInfo=lv=-57;cid=CID-01d25bd0-acba-47c4-9273-f2457a370f8b;nsid=1756420227;c=1581848181424)
Starting thread to transfer
BP-459146894-xxx-1581848181424:blk_1081077638_7337053 to
{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:08:10,468 [1728658240] - INFO
[org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@332bbd5d:DataNode$DataTransfer@2464]
- DataTransfer, at {color:#4c9aff}datanode0{color}:1004: Transmitted
BP-459146894-xxx-1581848181424:blk_1081077638_7337053 (numBytes=1555552) to
/{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:09:46,445 [1728754217] - INFO [BP-459146894-xxx-1581848181424
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
blk_1081077638_7337053 file
/mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2021-11-10 16:09:46,453 [1728754225] - INFO [Async disk worker #6294 for volume
/mnt/dfs/2/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode1}
2021-11-10 16:08:10,453 [165787777] - INFO [DataXceiver for client at
/datanode0:54958 [Receiving block
BP-459146894-10.196.80.35-1581848181424:blk_1081077638_7337053]:DataXceiver@717]
- Receiving BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src:
/{color:#4c9aff}datanode0{color}:54958 dest:
/{color:#4c9aff}datanode1{color}:1004
2021-11-10 16:08:10,480 [165787804] - INFO [DataXceiver for client at
/{color:#4c9aff}datanode0{color}:54958 [Receiving block
BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@892] -
Received BP-459146894-10.196.80.35-1581848181424:blk_1081077638_7337053 src:
/{color:#4c9aff}datanode0{color}:54958 dest:
/{color:#4c9aff}datanode1{color}:1004 of size 1555552
2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
volume
/mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode2}
2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
volume
/mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
> Missing blocks appeared after snn has transitioned to active state
> -------------------------------------------------------------------
>
> Key: HDFS-16578
> URL: https://issues.apache.org/jira/browse/HDFS-16578
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, namenode
> Affects Versions: 2.9.2
> Reporter: Hong Chen
> Priority: Critical
>
> There is no missing blocks in NN1, after NN2 has transitioned to active state
> by stopping zkfc of NN1, we have found some missing blocks in NN2 webui, and
> we test the corrupted file, "hadoop fs -get
> /user/xxx/d=2020-01-03/000154_0.lzo .", it is not readable.
> {panel:title=Exception}
> get: Could not obtain block:
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
> file=/user/xxx/d=2020-01-03/000154_0.lzo
> {panel}
> when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is
> healthy.
> {panel:title=Fscklog}
> /user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
> 0.
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
> len=1555552 {color:#172b4d}Live_repl=2{color}
> DatanodeInfoWithStorage[{color:#4c9aff}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
>
> [DatanodeInfoWithStorage[{color:#4c9aff}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
> Status: HEALTHY
> Total size: 1555552 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 1 (avg. block size 1555552 B)
> Minimally replicated blocks: 1 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 2.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 2400
> Number of racks: 90
> FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
> {panel}
> then we we check the {color:#172b4d}blk_1081077638_7337053{color} in
> datanodes log
> {panel:title=datanode0}
> 2021-11-10 16:08:10,441 [1728658213] - INFO [BP-459146894-xxx-1581848181424
> heartbeating to NN1/xxx:8021:DataNode@2255] -
> DatanodeRegistration({color:#4c9aff}datanode0{color}:1004,
> datanodeUuid=8a0d2e92-1c7c-4e32-8ce2-390b524d7ced, infoPort=1006,
> infoSecurePort=0, ipcPort=8025,
> storageInfo=lv=-57;cid=CID-01d25bd0-acba-47c4-9273-f2457a370f8b;nsid=1756420227;c=1581848181424)
> Starting thread to transfer
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053 to
> {color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:08:10,468 [1728658240] - INFO
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@332bbd5d:DataNode$DataTransfer@2464]
> - DataTransfer, at {color:#4c9aff}datanode0{color}:1004: Transmitted
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053 (numBytes=1555552) to
> /{color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:09:46,445 [1728754217] - INFO [BP-459146894-xxx-1581848181424
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
> blk_1081077638_7337053 file
> /mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> for deletion
> 2021-11-10 16:09:46,453 [1728754225] - INFO [Async disk worker #6294 for
> volume
> /mnt/dfs/2/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321]
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
> /mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
> {panel:title=datanode1}
> 2021-11-10 16:08:10,453 [165787777] - INFO [DataXceiver for client at
> /datanode0:54958 [Receiving block
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@717] -
> Receiving BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src:
> /{color:#4c9aff}datanode0{color}:54958 dest:
> /{color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:08:10,480 [165787804] - INFO [DataXceiver for client at
> /{color:#4c9aff}datanode0{color}:54958 [Receiving block
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@892] -
> Received BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src:
> /{color:#4c9aff}datanode0{color}:54958 dest:
> /{color:#4c9aff}datanode1{color}:1004 of size 1555552
> 2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
> {color:#172b4d}blk_1081077638_7337053{color} file
> /mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> for deletion
> 2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
> volume
> /mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321]
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
> /mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
> {panel:title=datanode2}
> 2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
> {color:#172b4d}blk_1081077638_7337053{color} file
> /mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> for deletion
> 2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
> volume
> /mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321]
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
> /mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]