[
https://issues.apache.org/jira/browse/HDFS-16697?focusedWorklogId=795934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795934
]
ASF GitHub Bot logged work on HDFS-16697:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Jul/22 05:28
Start Date: 28/Jul/22 05:28
Worklog Time Spent: 10m
Work Description: Likkey closed pull request #4641: HDFS-16697.Randomly
setting “dfs.namenode.resource.checked.volumes.minimum” will always prevent
safe mode from being turned off
URL: https://github.com/apache/hadoop/pull/4641
Issue Time Tracking
-------------------
Worklog Id: (was: 795934)
Time Spent: 20m (was: 10m)
> Randomly setting “dfs.namenode.resource.checked.volumes.minimum” will always
> prevent safe mode from being turned off
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-16697
> URL: https://issues.apache.org/jira/browse/HDFS-16697
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.1.3
> Environment: Linux version 4.15.0-142-generic
> (buildd@lgw01-amd64-039) (gcc version 5.4.0 20160609 (Ubuntu
> 5.4.0-6ubuntu1~16.04.12))
> java version "1.8.0_162"
> Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
> Reporter: Jingxuan Fu
> Assignee: Jingxuan Fu
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.1.3
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> {code:java}
> <property>
> <name>dfs.namenode.resource.checked.volumes.minimum</name>
> <value>1</value>
> <description>
> The minimum number of redundant NameNode storage volumes required.
> </description>
> </property>{code}
> I found that when setting the value of
> “dfs.namenode.resource.checked.volumes.minimum” is greater than the total
> number of storage volumes in the NameNode, it is always impossible to turn
> off the safe mode, and when in safe mode, the file system only accepts read
> data requests, but not delete, modify and other change requests, which is
> greatly limited by the function.
> The default value of the configuration item is 1, we set to 2 as an example
> for illustration, after starting hdfs logs and the client will throw the
> relevant reminders.
> {code:java}
> 2022-07-27 17:37:31,772 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on
> available disk space. Already in safe mode.
> 2022-07-27 17:37:31,772 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe
> mode is ON.
> Resources are low on NN. Please add or free up more resourcesthen turn off
> safe mode manually. NOTE: If you turn off safe mode before adding resources,
> the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode
> leave" to turn safe mode off.
> {code}
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create
> directory /hdfsapi/test. Name node is in safe mode.
> Resources are low on NN. Please add or free up more resourcesthen turn off
> safe mode manually. NOTE: If you turn off safe mode before adding resources,
> the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode
> leave" to turn safe mode off. NamenodeHostName:192.168.1.167
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1468)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1455)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3174)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1145)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:714)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
> at java.base/java.security.AccessController.doPrivileged(Native
> Method)
> at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916){code}
> According to the prompt, it is believed that there is not enough resource
> space to meet the corresponding conditions to close safe mode, but after
> adding or releasing more resources and lowering the resource condition
> threshold "dfs.namenode.resource.du.reserved", it still fails to close safe
> mode and throws the same prompt .
> According to the source code, we know that if the NameNode has redundant
> storage volumes less than the "dfs.namenode.resource.checked.volumes.minimum"
> set the minimum number of redundant storage volumes will enter safe mode.
> After debugging, *we found that the current NameNode storage volumes are
> abundant resource space, but because the total number of NameNode storage
> volumes is less than the set value, so the number of NameNode storage volumes
> with redundancy space must also be less than the set value, resulting in
> always entering safe mode.*
> In summary, it is found that the configuration item lacks a condition check
> and an associated exception handling mechanism, which makes it impossible to
> find the root cause of the impact when a misconfiguration occurs.
> The solution I propose is to use Precondition.checkArgument() to check the
> value of the configuration item and throw a IllegalArgumentException and
> detailed error message when the value is greater than the number of NameNode
> storage volumes to avoid the misconfiguration from affecting the subsequent
> operation of the program.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]