[ 
https://issues.apache.org/jira/browse/HDFS-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164350#comment-17164350
 ] 

Ayush Saxena commented on HDFS-15443:
-------------------------------------

In such a case there is only two solutions, first is as soon as you get to know 
the conf is invalid you fail the operation and alarm it out, Second is that you 
observe the value is invalid you correct it and use the default one, as it is 
done in many places, like {{DatanodeAdminMonitorBase}} and bunch of places 
others, The only thing that I feel what we can't do is tolerate the invalid 
value and go ahead with that only, by giving it a pass where it is creating 
trouble, which initially HDFS-15439 tends to do, That is why I though you don't 
want to crash, better change to default. Choice between the two approaches #1 
or #2 goes depending on case by case basis

Here in case of Datanode, it seems to be a long running service and one of the 
critical part of the cluster, I think here crashing and alarming for wrong conf 
should be better.

 

[~AMC-team] I think we can keep the current patch, just confirm the jenkins 
warnings aren't related.

> Setting dfs.datanode.max.transfer.threads to a very small value can cause 
> strange failure.
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15443
>                 URL: https://issues.apache.org/jira/browse/HDFS-15443
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: AMC-team
>            Priority: Major
>         Attachments: HDFS-15443.000.patch, HDFS-15443.001.patch, 
> HDFS-15443.002.patch
>
>
> Configuration parameter dfs.datanode.max.transfer.threads is to specify the 
> maximum number of threads to use for transferring data in and out of the DN. 
> This is a vital param that need to tune carefully. 
> {code:java}
> // DataXceiverServer.java
> // Make sure the xceiver count is not exceeded
> intcurXceiverCount = datanode.getXceiverCount();
> if (curXceiverCount > maxXceiverCount) {
> thrownewIOException("Xceiver count " + curXceiverCount
> + " exceeds the limit of concurrent xceivers: "
> + maxXceiverCount);
> }
> {code}
> There are many issues that caused by not setting this param to an appropriate 
> value. However, there is no any check code to restrict the parameter. 
> Although having a hard-and-fast rule is difficult because we need to consider 
> number of cores, main memory etc, *we can prevent users from setting this 
> value to an absolute wrong value by accident.* (e.g. a negative value that 
> totally break the availability of datanode.)
> *How to fix:*
> Add proper check code for the parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to