[
https://issues.apache.org/jira/browse/HDFS-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508038#comment-13508038
]
Hadoop QA commented on HDFS-4246:
---------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12555641/HDFS-4246.patch
against trunk revision .
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-hdfs-project/hadoop-hdfs.
{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3586//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3586//console
This message is automatically generated.
> The exclude node list should be more forgiving, for each output stream
> ----------------------------------------------------------------------
>
> Key: HDFS-4246
> URL: https://issues.apache.org/jira/browse/HDFS-4246
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 2.0.0-alpha
> Reporter: Harsh J
> Assignee: Harsh J
> Priority: Minor
> Attachments: HDFS-4246.patch
>
>
> Originally observed by Inder on the mailing lists:
> {quote}
> Folks,
> i was wondering if there is any mechanism/logic to move a node back from the
> excludedNodeList to live nodes to be tried for new block creation.
> In the current DFSOutputStream code i do not see this. The use-case is if the
> write timeout is being reduced and certain nodes get aggressively added to
> the excludedNodeList and the client caches DFSOutputStream then the
> excludedNodes never get tried again in the lifetime of the application
> caching DFSOutputStream
> {quote}
> What this leads to, is a special scenario, that may impact smaller clusters
> more than larger ones:
> 1. File is opened for continuous hflush/sync-based writes, such as a HBase
> WAL for example. This file is gonna be kept open for a very very long time,
> by design.
> 2. Over time, nodes are excluded for various errors, such as DN crashes,
> network failures, etc.
> 3. Eventually, exclude list == live nodes list or close, and the write
> suffers. At time of equality, the write also fails with an error of not being
> able to get a block allocation.
> We should perhaps make the excludeNodes list a timed-cache collection, so
> that even if it begins filling up, the older excludes are pruned away, giving
> those nodes a try again for later.
> One place we have to be careful about, though, is rack-failures. Those
> sometimes never come back fast enough, and can be problematic to retry code
> with such an eventually-forgiving list. Perhaps we can retain forgiven nodes
> and if they are entered again, we may double or triple the forgiveness value
> (in time units), to counter this? Its just one idea.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira