[
https://issues.apache.org/jira/browse/HDFS-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224911#comment-17224911
]
Wei-Chiu Chuang commented on HDFS-15650:
----------------------------------------
Sounds like a possible scenario... thanks for sharing that. I think in the
current DataNode design there's no good way to prevent that from happening in
general. A workaround is to increase DN thread pool size. But I am aware of
other cases where DN exhausts thread pool even to the point where it exceeds
maximum number of open file descriptors.
A quick remedy for this is to update the descriptionion of hdfs-default.xml
{noformat}
<property>
<name>dfs.client.socket-timeout</name>
<value>60000</value>
<description>
Default timeout value in milliseconds for all sockets.
</description>
</property>
{noformat}
and add reminders that in an erasure coded cluster it is recommended to make
dfs.client.socket-timeout consistent on both client and DNs.
> Make the socket timeout for computing checksum of striped blocks configurable
> -----------------------------------------------------------------------------
>
> Key: HDFS-15650
> URL: https://issues.apache.org/jira/browse/HDFS-15650
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode, ec, erasure-coding
> Reporter: Yushi Hayasaka
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Regarding the DataNode tries to get the checksum of EC internal blocks from
> another DataNode for computing the checksum of striped blocks, the timeout is
> hard-coded now, but it should be configurable.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]