[ https://issues.apache.org/jira/browse/HDFS-17589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900282#comment-17900282 ]
ruiliang commented on HDFS-17589: --------------------------------- [~yuyanlei] I used a clunky way to clear this blk file 1:Create an API on the datanode to check disk blk file info. python file:[^DiskBlkAPI.py] 2:The client analyzes the EC reconstruct striped block log (which needs to be collected separately), compares the disk blk on the ip with the namenode blk, and then goes back if there are too many BLKS. python file: [^FindDirtyBlk.py] ^case:^ ^!https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=39900719750b4e0a8041ef4974cbf0b5&docGuid=8WuVFWCqJw3yXz&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjNOVEZhejVrT1giLCJkb2NJZCI6IjhXdVZGV0NxSnczeVh6In0..TpGVchjIee0exDgK.Tlrt3kCoLtePC8ZLP28r7QkUNT9WgZa36FzOb3lhQ-QhxyIsSAMVS2CGnWDX8GL4MDgX8q0rudXldMmGenw9ewod52s26beRINgPw6zQ9aPE3luENVW4Ms_rwagDAUe4VKSKDJK-HNNH_vInMuR-nJMBvy_rfkxkaLJIl3rlfjwLy1I7GxaDuBep9ju0aYhXdCToDYDQrZW0vf76XGBjAfFUzQ.RCh7lfsQ_p3bJyOL2jfZIQ!^ ^After treatment:^ ^!https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=f50c1a93291949ec9a1a8de8e935519b&docGuid=8WuVFWCqJw3yXz&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjNOVEZhejVrT1giLCJkb2NJZCI6IjhXdVZGV0NxSnczeVh6In0..TpGVchjIee0exDgK.Tlrt3kCoLtePC8ZLP28r7QkUNT9WgZa36FzOb3lhQ-QhxyIsSAMVS2CGnWDX8GL4MDgX8q0rudXldMmGenw9ewod52s26beRINgPw6zQ9aPE3luENVW4Ms_rwagDAUe4VKSKDJK-HNNH_vInMuR-nJMBvy_rfkxkaLJIl3rlfjwLy1I7GxaDuBep9ju0aYhXdCToDYDQrZW0vf76XGBjAfFUzQ.RCh7lfsQ_p3bJyOL2jfZIQ!^ > hdfs EC data new blk reconstruct old blk not delete > ------------------------------------------------------ > > Key: HDFS-17589 > URL: https://issues.apache.org/jira/browse/HDFS-17589 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.1.1 > Reporter: ruiliang > Priority: Major > Attachments: DiskBlkAPI.py, FindDirtyBlk.py > > > The reason is that the cluster was faulty before, and Datanodes kept losing > connections and recovering, resulting in a lot of EC data reconstruct, but a > lot of old blk failed to clean up correctly. Has this been repaired? What > patch do I need to add, thank you > The following is a detailed check log > > ok: blk_-9223372036371044652 in 10.12.66.225 > {color:#de350b}error: blk_-9223372036371044652 in > 10.12.66.154(/data3/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652) > {color} > {color:#de350b}Why didn't you delete it?{color} > > {code:java} > ====datanode delete data ec blk ? > grep blk_-9223372036371044656 > hadoop-hdfs-root-datanode-fs-hiido-dn-12-66-111.hiido.host.xxx.com.log > 2024-07-18 17:25:07,879 INFO datanode.DataNode > (DataXceiver.java:writeBlock(738)) - Receiving > BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036371044656_1688858793 > src: /10.12.66.111:25066 dest: /10.12.66.111:1019 > 2024-07-18 17:25:17,396 INFO datanode.DataNode > (StripedBlockReconstructor.java:run(86)) - ok EC reconstruct striped block: > BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036371044656_1688858793 > blockId: -9223372036371044656 > 2024-07-18 17:25:17,396 INFO datanode.DataNode > (DataXceiver.java:writeBlock(914)) - Received > BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036371044656_1688858793 > src: /10.12.66.111:25066 dest: /10.12.66.111:1019 of size 193986560 > 2024-07-18 17:25:25,465 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:deleteAsync(225)) - Scheduling > blk_-9223372036371044656_1688858793 replica FinalizedReplica, > blk_-9223372036371044656_1688858793, FINALIZED > getBlockURI() = > file:/data4/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044656 > for deletion > 2024-07-18 17:25:25,746 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1822992414-10.12.65.48-1660893388633 blk_-9223372036371044656_1688858793 > URI > file:/data4/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044656=============my > config > dfs.blockreport.intervalMsec =21600000============namenode3 log > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-18 > 04:34:39,523 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-18 > 04:34:40,131 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-18 > 10:34:38,950 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-18 > 10:34:39,559 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log:2024-07-18 > 16:34:38,564 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log:2024-07-18 > 16:34:39,190 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-17 > 04:34:39,462 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-17 > 04:34:40,083 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-17 > 10:34:39,686 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-17 > 10:34:40,295 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-17 > 16:34:39,667 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-17 > 16:34:40,301 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-17 > 22:34:38,187 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.xxxyy.com.log.1:2024-07-17 > 22:34:38,794 WARN BlockStateChange (BlockManager.java:addStoredBlock(3238)) > - BLOCK* addStoredBlock: block blk_-9223372036371044656_1688858793 moved to > storageType DISK on node 10.12.66.154:1019 > =====namenode2 log active > grep blk_-9223372036371044656 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn2.hiido.host.xxx.com.log.10 > 2024-07-18 17:25:04,786 WARN BlockStateChange > (BlockManager.java:addStoredBlock(3238)) - BLOCK* addStoredBlock: block > blk_-9223372036371044656_1688858793 moved to storageType DISK on node > 10.12.66.154:1019 > 2024-07-18 17:25:05,703 WARN BlockStateChange > (BlockManager.java:addStoredBlock(3238)) - BLOCK* addStoredBlock: block > blk_-9223372036371044656_1688858793 moved to storageType DISK on node > 10.12.66.154:1019 > ========namenode2 log > root@fs-hiido-yycluster06-yynn1:/data/logs/hadoop/hdfs# grep > blk_-9223372036371044656 > hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn1.hiido.host.xxx.com.log > 2024-07-18 07:20:41,525 WARN BlockStateChange > (BlockManager.java:addStoredBlock(3238)) - BLOCK* addStoredBlock: block > blk_-9223372036371044656_1688858793 moved to storageType DISK on node > 10.12.66.154:1019 > 2024-07-18 07:20:42,049 WARN BlockStateChange > (BlockManager.java:addStoredBlock(3238)) - BLOCK* addStoredBlock: block > blk_-9223372036371044656_1688858793 moved to storageType DISK on node > 10.12.66.154:1019 > 2024-07-18 13:20:40,726 WARN BlockStateChange > (BlockManager.java:addStoredBlock(3238)) - BLOCK* addStoredBlock: block > blk_-9223372036371044656_1688858793 moved to storageType DISK on node > 10.12.66.154:1019 > 2024-07-18 13:20:41,251 WARN BlockStateChange > (BlockManager.java:addStoredBlock(3238)) - BLOCK* addStoredBlock: block > blk_-9223372036371044656_1688858793 moved to storageType DISK on node > 10.12.66.154:1019 > ================ > hdfs fsck -fs hdfs://yycluster06 -blockId blk_-9223372036371044656 > Connecting to namenode via > http://fs-hiido-yycluster06-yynn2.hiido.host.xxx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036371044656+&path=%2F > FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Thu Jul 18 > 17:57:06 CST 2024Block Id: blk_-9223372036371044656 > Block belongs to: > /hive_warehouse/yydw.db/dwv_event_detail_mob_quality_day/dt=2021-08-10/product_id=171/part-00210-f6fac929-f172-45cf-9fb7-0aa2f68c545e.c000.gz > No. of Expected Replica: 5 > No. of live Replica: 5 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: > fs-hiido-dn-12-66-225.hiido.host.xxxyy.com/4F08-02-03 is HEALTHY > Block replica on datanode/rack: > fs-hiido-dn-12-67-38.hiido.host.xxxyy.com/4F08-02-15 is HEALTHY > Block replica on datanode/rack: > fs-hiido-dn-12-66-191.hiido.host.xxx.com/4F08-12-06 is HEALTHY > Block replica on datanode/rack: > fs-hiido-dn-12-67-5.hiido.host.xxxyy.com/4F08-12-09 is HEALTHY > Block replica on datanode/rack: > fs-hiido-dn-12-66-154.hiido.host.xxxyy.com/4F08-02-13 is HEALTHYhdfs fsck -fs > hdfs://yycluster06 > /hive_warehouse/yydw.db/dwv_event_detail_mob_quality_day/dt=2021-08-10/product_id=171/part-00210-f6fac929-f172-45cf-9fb7-0aa2f68c545e.c000.gz > -files -blocks -locations > /hive_warehouse/yydw.db/dwv_event_detail_mob_quality_day/dt=2021-08-10/product_id=171/part-00210-f6fac929-f172-45cf-9fb7-0aa2f68c545e.c000.gz > 581647843 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s): OK > 0. > BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036371044656_1688858793 > len=581647843 Live_repl=5 [ > blk_-9223372036371044656:DatanodeInfoWithStorage[10.12.66.154:1019,DS-4b66fe61-93ca-4f8d-8fe0-e00f2ed09e82,DISK], > > blk_-9223372036371044655:DatanodeInfoWithStorage[10.12.67.5:1019,DS-af799695-8e9b-4884-a741-7a0742db6f79,DISK], > > blk_-9223372036371044654:DatanodeInfoWithStorage[10.12.66.191:1019,DS-cef171fa-8e8e-43bb-9fd1-aab84ea046cd,DISK], > > blk_-9223372036371044653:DatanodeInfoWithStorage[10.12.67.38:1019,DS-d2346a26-14c4-41e9-b349-e198ab5b684e,DISK], > > blk_-9223372036371044652:DatanodeInfoWithStorage[10.12.66.225:1019,DS-13c19c1d-9221-45cc-919f-b54ada9cab15,DISK]] > ======================================================================================================================================== > root@fs-hiido-dn-12-66-154:/data/logs/hadoop/hdfs# ll > /data*/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir*/subdir*/blk_-922337203637104465* > -rw-r--r-- 1 hdfs hdfs 193986560 Feb 26 19:54 > /data12/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044656 > -rw-r--r-- 1 hdfs hdfs 1515527 Feb 26 19:54 > /data12/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044656_1688858793.meta > -rw-r--r-- 1 hdfs hdfs 193986560 Feb 27 21:49 > /data3/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652 > -rw-r--r-- 1 hdfs hdfs 1515527 Feb 27 21:49 > /data3/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652_1688858793.meta > root@fs-hiido-dn-12-67-5:/home/liangrui# ll > /data*/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir*/subdir*/blk_-922337203637104465* > -rw-r--r-- 1 hdfs hdfs 193986560 Nov 29 2023 > /data12/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044655 > -rw-r--r-- 1 hdfs hdfs 1515527 Nov 29 2023 > /data12/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044655_1688858793.meta > root@fs-hiido-dn-12-66-191:/home/liangrui# ll > /data*/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir*/subdir*/blk_-922337203637104465* > -rw-r--r-- 1 hdfs hdfs 193674723 Aug 30 2023 > /data4/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044654 > -rw-r--r-- 1 hdfs hdfs 1513091 Aug 30 2023 > /data4/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044654_1688858793.meta > root@fs-hiido-dn-12-67-38:/home/liangrui# ll > /data*/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir*/subdir*/blk_-922337203637104465* > -rw-r--r-- 1 hdfs hdfs 193986560 Apr 19 18:20 > /data1/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044653 > -rw-r--r-- 1 hdfs hdfs 1515527 Apr 19 18:20 > /data1/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044653_1688858793.meta > root@fs-hiido-dn-12-66-225:/home/liangrui# ll > /data*/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir*/subdir*/blk_-922337203637104465* > -rw-r--r-- 1 hdfs hdfs 193986560 Dec 1 2023 > /data10/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652 > -rw-r--r-- 1 hdfs hdfs 1515527 Dec 1 2023 > /data10/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652_1688858793.meta > ======== > root@fs-hiido-dn-12-66-154:/data/logs/hadoop/hdfs# md5sum > /data3/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652 > b661b6d711d753a82c3bf42bb2ceec51 > /data3/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652 > 10.12.66.225 > md5sum > /data10/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652 > b661b6d711d753a82c3bf42bb2ceec51 > /data10/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir21/subdir6/blk_-9223372036371044652 > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org