[
https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819708#comment-13819708
]
Andrew Wang commented on HDFS-5366:
-----------------------------------
Hey Colin, thanks for the patch. Basically only nitty review stuff here:
* Since we already have a config key named
{{"dfs.namenode.path.based.cache.refresh.interval.ms"}}, can we call this one
{{"dfs.namenode.path.based.cache.retry.interval.ms"}}?
* New configs should go in hdfs-default.xml too
* Nit: extra newline in DatanodeManager#getCacheCommand
* Javadoc on DatanodeDescriptor methods saying whether they take wallclock or
monotonic time
* Any reason to prefer the iterator-based removal over using clear? If it's not
necessary, we could not do this to keep the diff small.
* Extra imports in CacheReplicationMonitor, DatanodeDescriptor
* In DatanodeManager, having variables named {{sendingCachingCommands}} and
{{sendCachingCommands}} is confusing, rename to {{retryCachingCommands}} or
something?
* Testing, it looks like the only changes are some additional prints. Can we
get a test that actually verifies that the NN will resend cache/uncache
commands? I bet you can intercept the commands with a spy or something so they
don't reach the DN.
> recaching improvements
> ----------------------
>
> Key: HDFS-5366
> URL: https://issues.apache.org/jira/browse/HDFS-5366
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-5366-caching.001.patch, HDFS-5366.002.patch
>
>
> There are a few things about our HDFS-4949 recaching strategy that could be
> improved.
> * We should monitor the DN's maximum and current mlock'ed memory consumption
> levels, so that we don't ask the DN to do stuff it can't.
> * We should not try to initiate caching on stale or decomissioning DataNodes
> (although we should not recache things stored on such nodes until they're
> declared dead).
> * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few
> times before giving up. Currently, we only send it once.
--
This message was sent by Atlassian JIRA
(v6.1#6144)