[
https://issues.apache.org/jira/browse/HDFS-16735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583803#comment-17583803
]
ASF GitHub Bot commented on HDFS-16735:
---------------------------------------
goiri commented on code in PR #4780:
URL: https://github.com/apache/hadoop/pull/4780#discussion_r952993440
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java:
##########
@@ -492,12 +498,12 @@ void heartbeatCheck() {
// log nodes detected as stale since last heartBeat
dumpStaleNodes(staleNodes);
- allAlive = dead == null && failedStorage == null;
+ allAlive = deadDatanodes.size() == 0 && failedStorages.size() == 0;
Review Comment:
isEmpty()
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java:
##########
@@ -96,6 +97,9 @@ class HeartbeatManager implements DatanodeStatistics {
enableLogStaleNodes = conf.getBoolean(
DFSConfigKeys.DFS_NAMENODE_ENABLE_LOG_STALE_DATANODE_KEY,
DFSConfigKeys.DFS_NAMENODE_ENABLE_LOG_STALE_DATANODE_DEFAULT);
+ this.removeBatchNum =
Review Comment:
```
this.removeBatchNum = conf.getInt(
DFSConfigKeys.DFS_NAMENODE_REMOVE_BAD_BATCH_NUM,
DFSConfigKeys.DFS_NAMENODE_REMOVE_BAD_BATCH_NUM_DEFAULT);
```
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java:
##########
@@ -436,12 +440,14 @@ void heartbeatCheck() {
return;
}
boolean allAlive = false;
+ // Locate limited dead nodes.
+ List<DatanodeDescriptor> deadDatanodes = new ArrayList<>(removeBatchNum);
+ // Locate limited failed storages that isn't on a dead node.
+ List<DatanodeStorageInfo> failedStorages = new ArrayList<>(removeBatchNum);
while (!allAlive) {
Review Comment:
break line
> Reduce the number of HeartbeatManager loops
> -------------------------------------------
>
> Key: HDFS-16735
> URL: https://issues.apache.org/jira/browse/HDFS-16735
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Shuyan Zhang
> Assignee: Shuyan Zhang
> Priority: Major
> Labels: pull-request-available
>
> HeartbeatManager only processes one dead datanode (and failed storage) per
> round in heartbeatCheck(), that is to say, if there are ten failed storages,
> all datanode states need to be scanned 10 times, which is unnecessary and a
> waste of resources. This patch makes the number of bad storages processed per
> scan configurable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]