[ 
https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819708#comment-13819708
 ] 

Andrew Wang commented on HDFS-5366:
-----------------------------------

Hey Colin, thanks for the patch. Basically only nitty review stuff here:

* Since we already have a config key named 
{{"dfs.namenode.path.based.cache.refresh.interval.ms"}}, can we call this one 
{{"dfs.namenode.path.based.cache.retry.interval.ms"}}?
* New configs should go in hdfs-default.xml too
* Nit: extra newline in DatanodeManager#getCacheCommand
* Javadoc on DatanodeDescriptor methods saying whether they take wallclock or 
monotonic time
* Any reason to prefer the iterator-based removal over using clear? If it's not 
necessary, we could not do this to keep the diff small.
* Extra imports in CacheReplicationMonitor, DatanodeDescriptor
* In DatanodeManager, having variables named {{sendingCachingCommands}} and 
{{sendCachingCommands}} is confusing, rename to {{retryCachingCommands}} or 
something?
* Testing, it looks like the only changes are some additional prints. Can we 
get a test that actually verifies that the NN will resend cache/uncache 
commands? I bet you can intercept the commands with a spy or something so they 
don't reach the DN.

> recaching improvements
> ----------------------
>
>                 Key: HDFS-5366
>                 URL: https://issues.apache.org/jira/browse/HDFS-5366
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5366-caching.001.patch, HDFS-5366.002.patch
>
>
> There are a few things about our HDFS-4949 recaching strategy that could be 
> improved.
> * We should monitor the DN's maximum and current mlock'ed memory consumption 
> levels, so that we don't ask the DN to do stuff it can't.
> * We should not try to initiate caching on stale or decomissioning DataNodes 
> (although we should not recache things stored on such nodes until they're 
> declared dead).
> * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few 
> times before giving up.  Currently, we only send it once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to