Jingxuan Fu created HDFS-16697:
----------------------------------

             Summary: Randomly setting 
“dfs.namenode.resource.checked.volumes.minimum” will always prevent safe mode 
from being turned off
                 Key: HDFS-16697
                 URL: https://issues.apache.org/jira/browse/HDFS-16697
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 3.1.3
            Reporter: Jingxuan Fu
            Assignee: Jingxuan Fu
             Fix For: 3.1.3


 
{code:java}
<property>
  <name>dfs.namenode.resource.checked.volumes.minimum</name>
  <value>1</value>
  <description>
    The minimum number of redundant NameNode storage volumes required.
  </description>
</property>{code}
 

We found that when setting the value of 
“dfs.namenode.resource.checked.volumes.minimum” is greater than the total 
number of storage volumes in the NameNode, it is always impossible to turn off 
the safe mode, and when in safe mode, the file system only accepts read data 
requests, but not delete, modify and other change requests, which is greatly 
limited by the function.

The default value of the configuration item is 1, we set to 2 as an example for 
illustration, after starting hdfs logs and the client will throw the relevant 
reminders.

 
{code:java}
2022-07-27 17:37:31,772 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on available 
disk space. Already in safe mode.
2022-07-27 17:37:31,772 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
mode is ON.
Resources are low on NN. Please add or free up more resourcesthen turn off safe 
mode manually. NOTE:  If you turn off safe mode before adding resources, the NN 
will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to 
turn safe mode off.
{code}
 

 
{code:java}
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create 
directory /hdfsapi/test. Name node is in safe mode.
Resources are low on NN. Please add or free up more resourcesthen turn off safe 
mode manually. NOTE:  If you turn off safe mode before adding resources, the NN 
will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to 
turn safe mode off. NamenodeHostName:192.168.1.167
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1468)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1455)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3174)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1145)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:714)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916){code}
 

According to the prompt, it is believed that there is not enough resource space 
to meet the corresponding conditions to close safe mode, but after adding or 
releasing more resources and lowering the resource condition threshold 
"dfs.namenode.resource.du.reserved", it still fails to close safe mode and 
throws the same prompt .

According to the source code, we know that if the NameNode has redundant 
storage volumes less than the "dfs.namenode.resource.checked.volumes.minimum" 
set the minimum number of redundant storage volumes will enter safe mode. After 
debugging, *we found that the current NameNode storage volumes are abundant 
resource space, but because the total number of NameNode storage volumes is 
less than the set value, so the number of NameNode storage volumes with 
redundancy space must also be less than the set value, resulting in always 
entering safe mode.*

In summary, it is found that the configuration item lacks a condition check and 
an associated exception handling mechanism, which makes it impossible to find 
the root cause of the impact when a misconfiguration occurs.

The solution I propose is to use Precondition.checkArgument() to check the 
value of the configuration item and give a prompt when the value is greater 
than the number of  NameNode storage volumes to avoid the misconfiguration from 
affecting the subsequent operation of the program.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to