[
https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796322#comment-13796322
]
Colin Patrick McCabe commented on HDFS-5366:
--------------------------------------------
The other question that came up in discussion on HDFS-5096 is whether we should
have a dedicated thread (independent of the {{CacheReplicationMonitor}} thread)
which periodically re-examines the outstanding cache and uncache requests, and
reschedules them to a different node if they aren't fulfilled. I've thought
about this, but I'm not sure that we need it.
The problem is that both caching and uncaching take time. Caching takes time
because it involves reading from disk. Uncaching takes time because a client
might have an mmap that needs to be revoked. The involuntary revocation period
will be at least 5 minutes, to avoid having clients burned by GCs.
if we're too aggressive about rescheduling our cache/uncache operations, we may
create a lot of churn. If the period of such a "rescheduler thread" would be
measured in minutes, isn't it simpler to just use the rescanning thread to
handle this scenario?
The other problem is that we currently rely on the {{DatanodeManager}} to tell
us when a node is bad. Its timeouts are generous (10.5 minutes by default to
declare a node dead), so the proposed "rescheduler" would either have to
maintain its own list of who is naughty and nice, or have a really long period
(again overlapping with the rescanner thread). I don't really want to
duplicate the deadNodes list...
I do think we should resend the DNA_CACHE, etc. as I mentioned above. Networks
do lose messages, after all. But we might have to assume that if a DN tells us
it can cache X bytes, that it's telling the truth. Otherwise, the failure
cases we have to think about tend to proliferate.
> recaching improvements
> ----------------------
>
> Key: HDFS-5366
> URL: https://issues.apache.org/jira/browse/HDFS-5366
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: HDFS-4949
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
>
> There are a few things about our HDFS-4949 recaching strategy that could be
> improved.
> * We should monitor the DN's maximum and current mlock'ed memory consumption
> levels, so that we don't ask the DN to do stuff it can't.
> * We should not try to initiate caching on stale DataNodes (although we
> should not recache things stored on such nodes until they're declared dead).
> * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few
> times before giving up. Currently, we only send it once.
--
This message was sent by Atlassian JIRA
(v6.1#6144)