[
https://issues.apache.org/jira/browse/HDFS-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hong Chen updated HDFS-16578:
-----------------------------
Description:
There is no missing blocks in NN1, after NN2 has transitioned to active state
by stopping zkfc of NN1, we have found some missing blocks in NN2 webui.
when NN2 is ANN, we test the corrupted file, "hadoop fs -get
/user/xxx/d=2020-01-03/000154_0.lzo .", it is not readble.
{panel:title=Exception}
get: Could not obtain block:
BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
file=/user/xxx/d=2020-01-03/000154_0.lzo
{panel}
when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is healthy.
{panel:title= Fscklog }
/user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
0. BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
len=1555552 {color:#172b4d}Live_repl=2{color}
DatanodeInfoWithStorage[{color:#172b4d}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
[DatanodeInfoWithStorage[{color:#172b4d}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
Status: HEALTHY
Total size: 1555552 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 1555552 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2400
Number of racks: 90
FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
{panel}
then we we check the {color:#172b4d}blk_1081077638_7337053{color} in datanodes
log
{panel:title=datanode1}
2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
volume
/mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode2}
2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
volume
/mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
was:
There is no missing blocks in NN1, after NN2 has transitioned to active state
by stopping zkfc of NN1, we have found some missing blocks in NN2 webui.
we test the file "hadoop fs -get /user/xxx/d=2020-01-03/000154_0.lzo .", it is
not readble.
{panel:title=Exception}
get: Could not obtain block:
BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
file=/user/xxx/d=2020-01-03/000154_0.lzo
{panel}
when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is healthy.
{panel:title= Fscklog }
/user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
0. BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
len=1555552 {color:#172b4d}Live_repl=2{color}
DatanodeInfoWithStorage[{color:#172b4d}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
[DatanodeInfoWithStorage[{color:#172b4d}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
Status: HEALTHY
Total size: 1555552 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 1555552 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2400
Number of racks: 90
FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
{panel}
then we we check the {color:#172b4d}blk_1081077638_7337053{color} in datanodes
log
{panel:title=datanode1}
2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
volume
/mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode2}
2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
volume
/mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
> Missing blocks appeared after snn has transitioned to active state
> -------------------------------------------------------------------
>
> Key: HDFS-16578
> URL: https://issues.apache.org/jira/browse/HDFS-16578
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, namenode
> Affects Versions: 2.9.2
> Reporter: Hong Chen
> Priority: Critical
>
> There is no missing blocks in NN1, after NN2 has transitioned to active state
> by stopping zkfc of NN1, we have found some missing blocks in NN2 webui.
> when NN2 is ANN, we test the corrupted file, "hadoop fs -get
> /user/xxx/d=2020-01-03/000154_0.lzo .", it is not readble.
> {panel:title=Exception}
> get: Could not obtain block:
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
> file=/user/xxx/d=2020-01-03/000154_0.lzo
> {panel}
> when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is
> healthy.
> {panel:title= Fscklog }
> /user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
> 0.
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
> len=1555552 {color:#172b4d}Live_repl=2{color}
> DatanodeInfoWithStorage[{color:#172b4d}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
>
> [DatanodeInfoWithStorage[{color:#172b4d}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
> Status: HEALTHY
> Total size: 1555552 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 1 (avg. block size 1555552 B)
> Minimally replicated blocks: 1 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 2.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 2400
> Number of racks: 90
> FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
> {panel}
> then we we check the {color:#172b4d}blk_1081077638_7337053{color} in
> datanodes log
> {panel:title=datanode1}
> 2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
> heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
> {color:#172b4d}blk_1081077638_7337053{color} file
> /mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> for deletion
> 2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
> volume
> /mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321]
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
> /mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
> {panel:title=datanode2}
> 2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
> heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
> {color:#172b4d}blk_1081077638_7337053{color} file
> /mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> for deletion
> 2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
> volume
> /mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321]
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
> /mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]