[jira] [Commented] (HDFS-17106) NameNode can timeout during initialization with dfs.datanode.volumes.replica-add.threadpool.size being 0

ASF GitHub Bot (Jira) Sat, 16 Sep 2023 08:33:19 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765989#comment-17765989
 ]


ASF GitHub Bot commented on HDFS-17106:
---------------------------------------

teamconfx opened a new pull request, #6090:
URL: https://github.com/apache/hadoop/pull/6090

   ### Description of PR
   
   When `dfs.datanode.volumes.replica-add.threadpool.size` is 0, HDFS cluster 
is never able to start and gets timed out eventually.
   
   To reproduce:
   1. set `dfs.datanode.volumes.replica-add.threadpool.size` to 0
   2. run `mvn surefire:test 
-Dtest=org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics#testExcessBlocks`
   
   This PR improves the description of `replica-add.threadpool.size` in 
`hdfs-default.xml` by sepecifying that it should be positive; the PR also 
checks whether the value is positive before it is used to initialize 
`addReplicaThreadPool`.
   
   ### How was this patch tested?
   
   Unit test
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under ASF 2.0?
   - [ ] If applicable, have you updated the LICENSE, LICENSE-binary, 
NOTICE-binary files?
   
   




> NameNode can timeout during initialization with 
> dfs.datanode.volumes.replica-add.threadpool.size being 0
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17106
>                 URL: https://issues.apache.org/jira/browse/HDFS-17106
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ConfX
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: reproduce.sh
>
>
> h2. What happened:
> When setting {{dfs.datanode.volumes.replica-add.threadpool.size}} to 0, HDFS 
> cluster is never able to start and gets timed out eventually.
> h2. Buggy code:
> Still investigating
> h2. StackTrace:
> {noformat}
> java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
>     at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1503)
>     at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:973)
>     at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:576)
>     at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518)
>     at 
> org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.setUp(TestNameNodeMetrics.java:166){noformat}
> h2. Reproduce:
> (1) Set {{dfs.datanode.volumes.replica-add.threadpool.size}} to 0
> (2) Run a simple test that exercises this parameter, e.g. 
> {{org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics#testExcessBlocks}}
> h2. Solution:
> If 0 is not a valid value, then it would be good to have a checker.
> For example like {{{}file.bytes-per-checksum{}}}, it has a checker to check 
> it must be larger than 0.
> {noformat}
> Preconditions.checkState(bytesPerChecksum > 0, "bytes per checksum should be 
> positive but was %s", bytesPerChecksum);{noformat}
> We could have similar checker
> {noformat}
> Preconditions.checkState(addReplicaThreadPool > 0, "addReplicaThreadPool 
> should be positive but was %s", addReplicaThreadPool);{noformat}
> For an easy reproduction, run the reproduce.sh in the attachment. We are 
> happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17106) NameNode can timeout during initialization with dfs.datanode.volumes.replica-add.threadpool.size being 0

Reply via email to