[
https://issues.apache.org/jira/browse/HDFS-16479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499870#comment-17499870
]
Yuanbo Liu commented on HDFS-16479:
-----------------------------------
[~tasanuma] basically we use
dfs.namenode.replication.max-streams
dfs.namenode.replication.max-streams-hard-limit
to limit the rate of replication for DataNode. If an index of striped block is
marked as busy index, that means there is overload replication work in the
DataNode. Adding busy index back to live index will ignore the load of DN and
I'm a little worry about the crash of DN if the load is too heavy. So increase
those two configurations will reduce the exception of
"IllegalArgumentException: No enough live striped blocks", which is the
strategy we are using now in our clusters.
> EC: When reconstruting ec block index, liveBusyBlockIndicies is not enclude,
> then reconstructing will fail
> ----------------------------------------------------------------------------------------------------------
>
> Key: HDFS-16479
> URL: https://issues.apache.org/jira/browse/HDFS-16479
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ec, erasure-coding
> Reporter: Yuanbo Liu
> Priority: Critical
>
> We got this exception from DataNodes
> {color:#707070}java.lang.IllegalArgumentException: No enough live striped
> blocks.{color}
> {color:#707070} at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:141){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.<init>(StripedReader.java:128){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReconstructor.<init>(StripedReconstructor.java:135){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.<init>(StripedBlockReconstructor.java:41){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker.processErasureCodingTasks(ErasureCodingWorker.java:133){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:796){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1314){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1360){color}
> {color:#707070} at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1287){color}
> After going through the code of ErasureCodingWork.java, we found
> {code:java}
> targets[0].getDatanodeDescriptor().addBlockToBeErasureCoded( new
> ExtendedBlock(blockPoolId, stripedBlk), getSrcNodes(), targets,
> getLiveBlockIndicies(), stripedBlk.getErasureCodingPolicy());
> {code}
>
> the liveBusyBlockIndicies is not considered as liveBlockIndicies, hence
> erasure coding reconstruction sometimes will fail as 'No enough live striped
> blocks'.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]