[
https://issues.apache.org/jira/browse/HDFS-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hong Chen updated HDFS-16578:
-----------------------------
Description:
There is no missing blocks in NN1(ann), after NN2 has transitioned to active
state by stopping zkfc of NN1, we have found some missing blocks in NN2 webui.
{panel:title=Exception}
hadoop fs -get /user/xxx/d=2020-01-03/000154_0.lzo .
get: Could not obtain block:
BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
file=/user/xxx/d=2020-01-03/000154_0.lzo
{panel}
{panel:title=/user/xxx/d=2020-01-03/000154_0.lzo file fscklog at NN1}
/user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
0. BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
len=1555552 {color:#172b4d}Live_repl=2{color}
DatanodeInfoWithStorage[{color:#172b4d}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
[DatanodeInfoWithStorage[{color:#172b4d}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
Status: HEALTHY
Total size: 1555552 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 1555552 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2400
Number of racks: 90
FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
{panel}
then we we check the {color:#172b4d}blk_1081077638_7337053{color} in datanodes
log
{panel:title=datanode1}
2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
volume
/mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode2}
2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
volume
/mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
was:
There is no missing blocks at NN1(ann), after NN2 has transitioned to active
state by stopping zkfc at NN1, we have found some missing blocks at NN2 webui.
{panel:title=Exception}
hadoop fs -get /user/xxx/d=2020-01-03/000154_0.lzo .
get: Could not obtain block:
BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
file=/user/xxx/d=2020-01-03/000154_0.lzo
{panel}
{panel:title=/user/xxx/d=2020-01-03/000154_0.lzo file fscklog at NN1}
/user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
0. BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
len=1555552 {color:#172b4d}Live_repl=2{color}
DatanodeInfoWithStorage[{color:#172b4d}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
[DatanodeInfoWithStorage[{color:#172b4d}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
Status: HEALTHY
Total size: 1555552 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 1555552 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2400
Number of racks: 90
FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
{panel}
then we we check the {color:#172b4d}blk_1081077638_7337053{color} in datanodes
log
{panel:title=datanode1}
2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
volume
/mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
{panel:title=datanode2}
2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
{color:#172b4d}blk_1081077638_7337053{color} file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
for deletion
2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
volume
/mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] -
Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
/mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
{panel}
> Missing blocks appeared after snn has transitioned to active state
> -------------------------------------------------------------------
>
> Key: HDFS-16578
> URL: https://issues.apache.org/jira/browse/HDFS-16578
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, namenode
> Affects Versions: 2.9.2
> Reporter: Hong Chen
> Priority: Critical
>
> There is no missing blocks in NN1(ann), after NN2 has transitioned to active
> state by stopping zkfc of NN1, we have found some missing blocks in NN2 webui.
> {panel:title=Exception}
> hadoop fs -get /user/xxx/d=2020-01-03/000154_0.lzo .
> get: Could not obtain block:
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
> file=/user/xxx/d=2020-01-03/000154_0.lzo
> {panel}
> {panel:title=/user/xxx/d=2020-01-03/000154_0.lzo file fscklog at NN1}
> /user/xxx/d=2020-01-03/000154_0.lzo 1555552 bytes, 1 block(s): OK
> 0.
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color}
> len=1555552 {color:#172b4d}Live_repl=2{color}
> DatanodeInfoWithStorage[{color:#172b4d}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
>
> [DatanodeInfoWithStorage[{color:#172b4d}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
> Status: HEALTHY
> Total size: 1555552 B
> Total dirs: 0
> Total files: 1
> Total symlinks: 0
> Total blocks (validated): 1 (avg. block size 1555552 B)
> Minimally replicated blocks: 1 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 2.0
> Corrupt blocks: 0
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 2400
> Number of racks: 90
> FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
> {panel}
> then we we check the {color:#172b4d}blk_1081077638_7337053{color} in
> datanodes log
> {panel:title=datanode1}
> 2022-05-10 12:00:42,984 [12699841344] - INFO [BP-459146894-xxx-1581848181424
> heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
> {color:#172b4d}blk_1081077638_7337053{color} file
> /mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> for deletion
> 2022-05-10 12:00:44,409 [12699842769] - INFO [Async disk worker #46179 for
> volume
> /mnt/dfs/9/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321]
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
> /mnt/dfs/9/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
> {panel:title=datanode2}
> 2021-11-29 16:27:07,411 [2765933340] - INFO [BP-459146894-xxx-1581848181424
> heartbeating to xxx/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling
> {color:#172b4d}blk_1081077638_7337053{color} file
> /mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> for deletion
> 2021-11-29 16:27:08,587 [2765934516] - INFO [Async disk worker #10145 for
> volume
> /mnt/dfs/5/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321]
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file
> /mnt/dfs/5/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]