[jira] [Commented] (HDFS-16479) EC: When reconstruting ec block index, liveBusyBlockIndicies is not enclude, then reconstructing will fail

Yuanbo Liu (Jira) Tue, 01 Mar 2022 18:47:06 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-16479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499870#comment-17499870
 ]


Yuanbo Liu commented on HDFS-16479:
-----------------------------------

[~tasanuma]  basically we use 

dfs.namenode.replication.max-streams

dfs.namenode.replication.max-streams-hard-limit

 to limit the rate of replication for DataNode. If an index of striped block is 
marked as busy index, that means there is overload replication work in the 
DataNode. Adding busy index back to live index will ignore the load of DN and 
I'm a little worry about the crash of DN if the load is too heavy. So increase 
those two configurations will reduce the exception of 
"IllegalArgumentException: No enough live striped blocks", which is the 
strategy we are using now in our clusters.

 

> EC: When reconstruting ec block index, liveBusyBlockIndicies is not enclude, 
> then reconstructing will fail
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16479
>                 URL: https://issues.apache.org/jira/browse/HDFS-16479
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec, erasure-coding
>            Reporter: Yuanbo Liu
>            Priority: Critical
>
> We got this exception from DataNodes
> {color:#707070}java.lang.IllegalArgumentException: No enough live striped 
> blocks.{color}
> {color:#707070}        at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:141){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.<init>(StripedReader.java:128){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReconstructor.<init>(StripedReconstructor.java:135){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.<init>(StripedBlockReconstructor.java:41){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker.processErasureCodingTasks(ErasureCodingWorker.java:133){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:796){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1314){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1360){color}
> {color:#707070}        at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1287){color}
> After going through the code of ErasureCodingWork.java, we found
> {code:java}
> targets[0].getDatanodeDescriptor().addBlockToBeErasureCoded( new 
> ExtendedBlock(blockPoolId, stripedBlk), getSrcNodes(), targets, 
> getLiveBlockIndicies(), stripedBlk.getErasureCodingPolicy()); 
> {code}
>  
> the liveBusyBlockIndicies is not considered as liveBlockIndicies, hence 
> erasure coding reconstruction sometimes will fail as 'No enough live striped 
> blocks'.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16479) EC: When reconstruting ec block index, liveBusyBlockIndicies is not enclude, then reconstructing will fail

Reply via email to