[ 
https://issues.apache.org/jira/browse/HDFS-16739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16739:
----------------------------------
    Labels: pull-request-available  (was: )

> EC: Reconstruction failed when file has specified StoragePolicy
> ---------------------------------------------------------------
>
>                 Key: HDFS-16739
>                 URL: https://issues.apache.org/jira/browse/HDFS-16739
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.3
>            Reporter: MingHui Luo
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.1.3
>
>
> We found that due to BlockReconstructionWork use the same chooseTarget 
> function with Redundancy Block, so the targe returned is more than real 
> additionalReplRequired due to need to satisfy storage policy. So , it causes 
> all kind of exception when DN do ECReconstructionWork.
> One of Exception in DN as follows:
> {code:java}
> 2022-08-24 03:01:39,534 WARN [Command processor] 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to reconstruct 
> striped block blk_-9223372032283192848_35319673088
> java.lang.IllegalArgumentException: Too much missed striped blocks.
>     at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
>     at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.<init>(StripedWriter.java:87)
>     at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.<init>(StripedBlockReconstructor.java:45)
>     at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker.processErasureCodingTasks(ErasureCodingWorker.java:134)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:797)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1306)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1344)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1280)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1267)
>  {code}
> this file ec policy is RS-6-3-1024k, here is inner block info, 
> blk_-9223372032283192845 (index:3) need to reconstruct , and all Storage is 
> DISK ,but the file's storage policy is ALL_SSD
> {code:java}
> [blk_-9223372032283192848:DatanodeInfoWithStorage[10.x.x.33:50010,DS-e1435341-f43c-42ef-806f-90fsddfsfdcd,DISK],
>  
> blk_-9223372032283192847:DatanodeInfoWithStorage[10.x.x.35:50010,DS-a6dsd16a-676a-4fed-8ffe-fsdfscw23445,DISK],
>  
> blk_-9223372032283192846:DatanodeInfoWithStorage[10.x.x.34:50010,DS-40cdc124-e2e0-40f6-aa47-4d2bdsf3e8e5,DISK],
>  
> blk_-9223372032283192844:DatanodeInfoWithStorage[10.x.x.21:50010,DS-ef9dee4f-dfb2-495c-872a-974dfscds58e,DISK],
>  
> blk_-9223372032283192843:DatanodeInfoWithStorage[10.x.x.40:50010,DS-6dsedfa7-8291-46bb-964d-dfsf34567655,DISK],
>  
> blk_-9223372032283192842:DatanodeInfoWithStorage[10.x.x.36:50010,DS-2dddc387-c38b-427d-9925-15a664d3472b,DISK],
>  
> blk_-9223372032283192841:DatanodeInfoWithStorage[10.x.x.151:50010,DS-fds91a7-89ad-4899-bc44-675dfs32f58e,DISK],
>  
> blk_-9223372032283192840:DatanodeInfoWithStorage[10.x.x.27:50010,DS-77dfs4c1-c23c-4b26-baa3-aadsfdff4118,DISK]]
>  {code}
> here is BlockECReconstructionInfo, due to all inner block is not satisfied 
> with storage policy(ALL_SSD) , so the target length is 9 rather than 1. 
> {code:java}
> 2022-08-24 03:01:39,534 INFO [Command processor] 
> org.apache.hadoop.hdfs.server.datanode.DataNode: processErasureCodingTasks  
> BlockECReconstructionInfo(
>   Recovering 
> BP-390041874-10.x.x.x-1550651014658:blk_-9223372032283192848_35319673088 
> From: [10.x.x.33:50010, 10.x.x.35:50010, 10.x.x.34:50010, 10.x.x.21:50010, 
> 10.x.x.40:50010, 10.x.x.36:50010, 10.x.x.151:50010, 10.x.x.27:50010] To: 
> [[10.x.x.37:50010, 10.x.x.21:50010, 10.x.x.32:50010, 10.x.x.27:50010, 
> 10.x.x.28:50010, 10.x.x.23:50010, 10.x.x.23:50010, 10.x.x.101:50010, 
> 10.x.x.32:50010])
>  Block Indices: [0, 1, 2, 4, 5, 6, 7, 8] {code}
> when init stripedWriter in DN StripedBlockReconstructor, need to judge 
> targetIndicies.length<=prityBlkNum (9<=3) . so, this striped blocks will 
> never reconstruct successfully.
> {code:java}
> targetIndices = new short[targets.length];
> Preconditions.checkArgument(targetIndices.length <= parityBlkNum,
>     "Too much missed striped blocks."); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to