[ 
https://issues.apache.org/jira/browse/HDFS-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854471#comment-13854471
 ] 

Andrew Wang commented on HDFS-5651:
-----------------------------------

This JIRA has turned out to be a lot harder than expected :) Maybe we should 
rename it to reflect the real issue at hand, which is the deadlock, or the 
transition-to-standby. Some review comments.

* I like that the lock protecting CRM isn't in CRM.

Nits:
* Do you want to put the default value for the new config option in the apt.vm 
file too?
* Can you add a message to the Precondition checks?
* I see you removed the TODO for pending/underCached/etc block stats, do you 
want to file a follow-on for that?
* Add a warn or info if the cachedBlocksPercent minimum kicks in.

Rest:
* This locking scheme where we need to recheck shutdown and not modify cache 
manager state feels like a potential landmine, especially since the waitFor's 
need to be moved all the way up to FSN. Could we instead periodically check the 
thread's interrupt status in {{CRM#rescan}} and throw InterruptedException, and 
go back to joining on the CRM thread? waitFor could check CRM's shutdown status 
and also throw InterruptedException. This might also let the waitFors move back 
into CacheManager.
* Should we wipe out the various {{cached}} stats when we go to standby? I 
think the needed ones will be adjusted properly as the standby tails the edit 
log, but the cached ones will just sit there.
* It seems like we really should have a test for transition-to-standby when a 
long CRM rescan is happening. This feels doable with some test injection 
functions to force sleeps. If you want to stick with multiple CRM threads, 
maybe also test fluttering between standby/active repeatedly and checking for 
thread cleanup and data consistency.

> remove dfs.namenode.caching.enabled
> -----------------------------------
>
>                 Key: HDFS-5651
>                 URL: https://issues.apache.org/jira/browse/HDFS-5651
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5651.001.patch, HDFS-5651.002.patch, 
> HDFS-5651.003.patch, HDFS-5651.004.patch, HDFS-5651.006.patch
>
>
> We can remove dfs.namenode.caching.enabled and simply always enable caching, 
> similar to how we do with snapshots and other features.  The main overhead is 
> the size of the cachedBlocks GSet.  However, we can simply make the size of 
> this GSet configurable, and people who don't want caching can set it to a 
> very small value.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to