[
https://issues.apache.org/jira/browse/HDFS-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302822#comment-14302822
]
Tianyin Xu commented on HDFS-7726:
----------------------------------
Thanks a lot, Zhe! I refined the patch based on your feedback.
1. rpcTimeout could be negative. It's passed to the RPC protocol
(org.apache.hadoop.ipc.RPC.java) which does not explicitly define the range of
rpcTimeout. I tested negative values and it works fine. Let me know if you
wanna check rpcTimeout to be positive.
2. Yes, the new patch uses Preconditions.checkArgument
3. Fixed!
4. Removed the line.
5. Now it ends with .patch :)
> Parse and check the configuration settings of edit log to prevent runtime
> errors
> --------------------------------------------------------------------------------
>
> Key: HDFS-7726
> URL: https://issues.apache.org/jira/browse/HDFS-7726
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.6.0
> Reporter: Tianyin Xu
> Priority: Minor
> Attachments: check_config_EditLogTailer.patch,
> check_config_val_EditLogTailer.patch.1
>
>
> ============================
> Problem
> -------------------------------------------------
> Similar as the following two issues addressed in 2.7.0,
> https://issues.apache.org/jira/browse/YARN-2165
> https://issues.apache.org/jira/browse/YARN-2166
> The edit log related configuration settings should be checked in the
> constructor rather than being applied directly at runtime. This would cause
> runtime failures if the values are wrong.
> Take "dfs.ha.tail-edits.period" as an example, currently in
> EditLogTailer.java, its value is not checked but directly used in doWork(),
> as the following code snippets. Any negative values would cause
> IllegalArgumentException (which is not caught) and impair the component.
> {code:title=EditLogTailer.java|borderStyle=solid}
> private void doWork() {
> {
> .....
> Thread.sleep(sleepTimeMs);
> ....
> }
> {code}
> Another example is "dfs.ha.log-roll.rpc.timeout". Right now, we use getInt()
> to parse the value at runtime in the getActiveNodeProxy() function which is
> called by doWork(), shown as below. Any erroneous settings (e.g.,
> ill-formatted integer) would cause exceptions.
> {code:title=EditLogTailer.java|borderStyle=solid}
> private NamenodeProtocol getActiveNodeProxy() throws IOException {
> {
> .....
> int rpcTimeout = conf.getInt(
> DFSConfigKeys.DFS_HA_LOGROLL_RPC_TIMEOUT_KEY,
> DFSConfigKeys.DFS_HA_LOGROLL_RPC_TIMEOUT_DEFAULT);
> ....
> }
> {code}
> ============================
> Solution (the attached patch)
> -------------------------------------------------
> Basically, the idea of the attached patch is to move the parsing and checking
> logics into the constructor to expose the error at initialization, so that
> the errors won't be latent at the runtime (same as YARN-2165 and YARN-2166)
> I'm not aware of the implementation of 2.7.0. It seems there's checking
> utilities such as the validatePositiveNonZero function in YARN-2165. If so,
> we can use that one to make the checking more systematic.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)