[ 
https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796322#comment-13796322
 ] 

Colin Patrick McCabe commented on HDFS-5366:
--------------------------------------------

The other question that came up in discussion on HDFS-5096 is whether we should 
have a dedicated thread (independent of the {{CacheReplicationMonitor}} thread) 
which periodically re-examines the outstanding cache and uncache requests, and 
reschedules them to a different node if they aren't fulfilled.  I've thought 
about this, but I'm not sure that we need it.

The problem is that both caching and uncaching take time.  Caching takes time 
because it involves reading from disk.  Uncaching takes time because a client 
might have an mmap that needs to be revoked.  The involuntary revocation period 
will be at least 5 minutes, to avoid having clients burned by GCs.

if we're too aggressive about rescheduling our cache/uncache operations, we may 
create a lot of churn.  If the period of such a "rescheduler thread" would be 
measured in minutes, isn't it simpler to just use the rescanning thread to 
handle this scenario?

The other problem is that we currently rely on the {{DatanodeManager}} to tell 
us when a node is bad.  Its timeouts are generous (10.5 minutes by default to 
declare a node dead), so the proposed "rescheduler" would either have to 
maintain its own list of who is naughty and nice, or have a really long period 
(again overlapping with the rescanner thread).  I don't really want to 
duplicate the deadNodes list...

I do think we should resend the DNA_CACHE, etc. as I mentioned above.  Networks 
do lose messages, after all.  But we might have to assume that if a DN tells us 
it can cache X bytes, that it's telling the truth.  Otherwise, the failure 
cases we have to think about tend to proliferate.

> recaching improvements
> ----------------------
>
>                 Key: HDFS-5366
>                 URL: https://issues.apache.org/jira/browse/HDFS-5366
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-4949
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> There are a few things about our HDFS-4949 recaching strategy that could be 
> improved.
> * We should monitor the DN's maximum and current mlock'ed memory consumption 
> levels, so that we don't ask the DN to do stuff it can't.
> * We should not try to initiate caching on stale DataNodes (although we 
> should not recache things stored on such nodes until they're declared dead).
> * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few 
> times before giving up.  Currently, we only send it once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to