[
https://issues.apache.org/jira/browse/HDFS-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500770#comment-14500770
]
Xiaoyu Yao commented on HDFS-8179:
----------------------------------
This is a bug in DFSClient#getServerDefaults(). There are two issues here
1. DFSClient#serverDefaults is null within 1 hour
(SERVER_DEFAULTS_VALIDITY_PERIOD) of NN start. The null serverDefaults is
returned from DFSClient#getServerDefaults(), which causes the "hadoop fs -rm
-r" NPE.
2. HDFS-6841 changes to use Time#monotonicNow instead of Time.now(). It
represents time elapsed since an arbitrary origin which should only be compared
with value returned from Time#monotonicNow(). But the serverDefaultLastUpdate
has an initial value of 0 through default initialization. It should be
initialized with Time.monotonicNow(). There is similar issue Arpit Agarwal
found in HDFS-8163.
{code}
public FsServerDefaults getServerDefaults() throws IOException {
long now = Time.monotonicNow();
if (now - serverDefaultsLastUpdate > SERVER_DEFAULTS_VALIDITY_PERIOD) {
serverDefaults = namenode.getServerDefaults();
serverDefaultsLastUpdate = now;
}
return serverDefaults;
{code}
The proposed fix:
{code}
public FsServerDefaults getServerDefaults() throws IOException {
long now = Time.monotonicNow();
if ((serverDefaults == null) || (now - serverDefaultsLastUpdate >
SERVER_DEFAULTS_VALIDITY_PERIOD)) {
serverDefaults = namenode.getServerDefaults();
serverDefaultsLastUpdate = now;
}
return serverDefaults;
{code}
> DFSClient#getServerDefaults returns null within 1 hour of NN start
> ------------------------------------------------------------------
>
> Key: HDFS-8179
> URL: https://issues.apache.org/jira/browse/HDFS-8179
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Xiaoyu Yao
> Assignee: Xiaoyu Yao
>
> We recently hit NPE during Ambari Oozie service check. The failed hdfs
> command is below. It repros sometimes and then go away after the cluster runs
> for a while.
> {code}
> [ambari-qa@c6401 ~]$ hadoop --config /etc/hadoop/conf fs -rm -r
> /user/ambari-qa/mapredsmokeoutput
> rm: Failed to get server trash configuration: null. Consider using -skipTrash
> option
> {code}
> With additional tracing, the failure was located to the following stack.
> {code}
> 15/04/17 20:57:12 DEBUG fs.Trash: Failed to get server trash configuration
> java.lang.NullPointerException
> at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:86)
> at org.apache.hadoop.fs.shell.Delete$Rm.moveToTrash(Delete.java:117)
> at org.apache.hadoop.fs.shell.Delete$Rm.processPath(Delete.java:104)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:321)
> at
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:293)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:275)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:259)
> at
> org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:205)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:166)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
> rm: Failed to get server trash configuration: null. Consider using -skipTrash
> option
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)