[
https://issues.apache.org/jira/browse/HDFS-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339120#comment-16339120
]
Yiqun Lin commented on HDFS-13044:
----------------------------------
Hi [~elgoiri], there are some initial review comments from me.
*DFSConfigKeys.java*
1. Can we rename config name of {{DFS_ROUTER_SAFEMODE_CACHE_EXPIRATION_MS}}
from {{dfs.federation.router.safemode.cacheexpiration.count}} to
{{dfs.federation.router.safemode.cache-expiration.interval}}? Why not use {{2
or 3 * DFS_ROUTER_CACHE_TIME_TO_LIVE_MS_DEFAULT}} as the default expiration
interval time? Looks these two config are related, this will be better than
just assigning a new value.
2. It would be better to make these configs support multiple time units
suffixes.
3. Please document these configs in {{hdfs-default.xml}}.
*Router.java*
1.Following lines need to update.
{code:java}
+ if (this.safemodeService == null) {
+ // Router is running now
+ // TODO
+ }
{code}
2. line125. This comment seems to be {{/** The start time of the Router. */}}.
This was something I am missing caught before.
*RouterRpcServer.java*
line425: Why not define and throw a sub-exception of {{SafeModeException}}?
line426: Missing "is" before {{in safe mode and cannot handle}}.
line446: Can we rename {{isSafeMode}} to {{isInSafeMode}}?
*RouterSafemodeService.java*
line33 and 36: Seems {{FederationStateStoreService}} is non-exist.
line55: Maybe we need to define a new field named {{enterTime}} to calculate
elapsed time during safemoe. And {{enterTime}} will be updated in {{enter}}
method every time. We may experience enter/leave safemode many times in real
env.
line82: {{synchronized}} is not needed since {{MutableGaugeInt#set}} is
thread-safe to use. It uses AtomicInteger type to record value inside.
line84: Can we print the elapsed time during safemode as well?
line90: {{(int)}} is not needed, it already did in
{{RouterMetrics#setSafeModeTime}}.
line110 and 117: Can we print the interval time with {{MILLISECONDS}} format?
There is a chance the value loosing accurate. For example, we set the interval
time 999ms, it will return 0 when convert to second format.
*TestRouterSafemode.java*
line43: The javadoc looks confused. There are two duplicate {{the}}.
{{SafeModeTimer}} is non-exist.
line116: We are just for the testing, no need to sleep so long time. We can
greatly decrease the interval time tested in this class. The same problem in
{{testRouterEnterSafemode}}.
Additional two test cases we may needed:
1. Specific test for {{dfs.federation.router.safemode.startup.interval}}. That
is say, router won't leaf safemode until passing interval time.
2. The case that RouterRpcServer rejects the reuquests when it's in safemode.
{quote}We could also add an API in the Router Admin to set the safe mode by
hand.
{quote}
+1 for this proposal. We can file new JIRA for the tracking.
In addition, we need to document the meaning of router states. At least I see
the semantic of safemode in RBF is different with that in normal HDFS.
> RBF: Add a safe mode for the Router
> -----------------------------------
>
> Key: HDFS-13044
> URL: https://issues.apache.org/jira/browse/HDFS-13044
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Íñigo Goiri
> Assignee: Íñigo Goiri
> Priority: Major
> Attachments: HDFS-13004.000.patch
>
>
> When a Router cannot communicate with the State Store, it should enter into a
> safe mode that disallows certain operations.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]