[ 
https://issues.apache.org/jira/browse/HDFS-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339120#comment-16339120
 ] 

Yiqun Lin commented on HDFS-13044:
----------------------------------

Hi [~elgoiri], there are some initial review comments from me.

*DFSConfigKeys.java*
 1. Can we rename config name of {{DFS_ROUTER_SAFEMODE_CACHE_EXPIRATION_MS}} 
from {{dfs.federation.router.safemode.cacheexpiration.count}} to 
{{dfs.federation.router.safemode.cache-expiration.interval}}? Why not use {{2 
or 3 * DFS_ROUTER_CACHE_TIME_TO_LIVE_MS_DEFAULT}} as the default expiration 
interval time? Looks these two config are related, this will be better than 
just assigning a new value.
 2. It would be better to make these configs support multiple time units 
suffixes.
 3. Please document these configs in {{hdfs-default.xml}}.

*Router.java*
 1.Following lines need to update.
{code:java}
+    if (this.safemodeService == null) {
+      // Router is running now
+      // TODO
+    }
{code}
2. line125. This comment seems to be {{/** The start time of the Router. */}}. 
This was something I am missing caught before.

 *RouterRpcServer.java*
 line425: Why not define and throw a sub-exception of {{SafeModeException}}?
 line426: Missing "is" before {{in safe mode and cannot handle}}.
 line446: Can we rename {{isSafeMode}} to {{isInSafeMode}}?

*RouterSafemodeService.java*
 line33 and 36: Seems {{FederationStateStoreService}} is non-exist.
 line55: Maybe we need to define a new field named {{enterTime}} to calculate 
elapsed time during safemoe. And {{enterTime}} will be updated in {{enter}} 
method every time. We may experience enter/leave safemode many times in real 
env.
 line82: {{synchronized}} is not needed since {{MutableGaugeInt#set}} is 
thread-safe to use. It uses AtomicInteger type to record value inside.
 line84: Can we print the elapsed time during safemode as well?
 line90: {{(int)}} is not needed, it already did in 
{{RouterMetrics#setSafeModeTime}}.
 line110 and 117: Can we print the interval time with {{MILLISECONDS}} format? 
There is a chance the value loosing accurate. For example, we set the interval 
time 999ms, it will return 0 when convert to second format.

*TestRouterSafemode.java*
 line43: The javadoc looks confused. There are two duplicate {{the}}. 
{{SafeModeTimer}} is non-exist.
 line116: We are just for the testing, no need to sleep so long time. We can 
greatly decrease the interval time tested in this class. The same problem in 
{{testRouterEnterSafemode}}.
 Additional two test cases we may needed:
 1. Specific test for {{dfs.federation.router.safemode.startup.interval}}. That 
is say, router won't leaf safemode until passing interval time.
 2. The case that RouterRpcServer rejects the reuquests when it's in safemode.

 
{quote}We could also add an API in the Router Admin to set the safe mode by 
hand.
{quote}
+1 for this proposal. We  can file new JIRA for the tracking.

In addition, we need to document the meaning of router states. At least I see 
the semantic of safemode in RBF is different with that in normal HDFS.

> RBF: Add a safe mode for the Router
> -----------------------------------
>
>                 Key: HDFS-13044
>                 URL: https://issues.apache.org/jira/browse/HDFS-13044
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Assignee: Íñigo Goiri
>            Priority: Major
>         Attachments: HDFS-13004.000.patch
>
>
> When a Router cannot communicate with the State Store, it should enter into a 
> safe mode that disallows certain operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to