[ 
https://issues.apache.org/jira/browse/HDFS-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937797#comment-17937797
 ] 

NaihaoFan commented on HDFS-4246:
---------------------------------

Hi [~qwertymaniac], hope all is well.

I have one question, in the method getExcludedNodes, why use 
`excludedNodes.getAllPresent(excludedNodes.asMap().keySet()).keySet().toArray(DatanodeInfo.EMPTY_ARRAY);`
 instead of directly 
`excludedNodes.asMap().keySet().toArray(DatanodeInfo.EMPTY_ARRAY);`?

Seems the first (current) implementation will refresh the nodes? Refer to: 
[CachesExplained ยท google/guava 
Wiki|https://github.com/google/guava/wiki/CachesExplained#asmap]

Do we intend to refresh the nodes cached when get? I think maybe not refresh it 
is the right implementation. (If I missed some background)

> The exclude node list should be more forgiving, for each output stream
> ----------------------------------------------------------------------
>
>                 Key: HDFS-4246
>                 URL: https://issues.apache.org/jira/browse/HDFS-4246
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.0.0-alpha
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 2.1.0-beta
>
>         Attachments: HDFS-4246.patch, HDFS-4246.patch, HDFS-4246.patch
>
>
> Originally observed by Inder on the mailing lists:
> {quote}
> Folks,
> i was wondering if there is any mechanism/logic to move a node back from the 
> excludedNodeList to live nodes to be tried for new block creation.
> In the current DFSOutputStream code i do not see this. The use-case is if the 
> write timeout is being reduced and certain nodes get aggressively added to 
> the excludedNodeList and the client caches DFSOutputStream then the 
> excludedNodes never get tried again in the lifetime of the application 
> caching DFSOutputStream
> {quote}
> What this leads to, is a special scenario, that may impact smaller clusters 
> more than larger ones:
> 1. File is opened for continuous hflush/sync-based writes, such as a HBase 
> WAL for example. This file is gonna be kept open for a very very long time, 
> by design.
> 2. Over time, nodes are excluded for various errors, such as DN crashes, 
> network failures, etc.
> 3. Eventually, exclude list == live nodes list or close, and the write 
> suffers. At time of equality, the write also fails with an error of not being 
> able to get a block allocation.
> We should perhaps make the excludeNodes list a timed-cache collection, so 
> that even if it begins filling up, the older excludes are pruned away, giving 
> those nodes a try again for later.
> One place we have to be careful about, though, is rack-failures. Those 
> sometimes never come back fast enough, and can be problematic to retry code 
> with such an eventually-forgiving list. Perhaps we can retain forgiven nodes 
> and if they are entered again, we may double or triple the forgiveness value 
> (in time units), to counter this? Its just one idea.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to