[
https://issues.apache.org/jira/browse/HDFS-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ayush Saxena resolved HDFS-16697.
---------------------------------
Fix Version/s: 3.4.0
Hadoop Flags: Reviewed
Resolution: Fixed
> Add logs if resources are not available in NameNodeResourcePolicy
> -----------------------------------------------------------------
>
> Key: HDFS-16697
> URL: https://issues.apache.org/jira/browse/HDFS-16697
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.1.3
> Environment: Linux version 4.15.0-142-generic
> (buildd@lgw01-amd64-039) (gcc version 5.4.0 20160609 (Ubuntu
> 5.4.0-6ubuntu1~16.04.12))
> java version "1.8.0_162"
> Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
> Reporter: ECFuzz
> Assignee: ECFuzz
> Priority: Minor
> Labels: pull-request-available
> Fix For: 3.4.0
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> {code:java}
> <property>
> <name>dfs.namenode.resource.checked.volumes.minimum</name>
> <value>1</value>
> <description>
> The minimum number of redundant NameNode storage volumes required.
> </description>
> </property>{code}
> I found that when setting the value of
> “dfs.namenode.resource.checked.volumes.minimum” is greater than the total
> number of storage volumes in the NameNode, it is always impossible to turn
> off the safe mode, and when in safe mode, the file system only accepts read
> data requests, but not delete, modify and other change requests, which is
> greatly limited by the function.
> The default value of the configuration item is 1, we set to 2 as an example
> for illustration, after starting hdfs logs and the client will throw the
> relevant reminders.
> {code:java}
> 2022-07-27 17:37:31,772 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on
> available disk space. Already in safe mode.
> 2022-07-27 17:37:31,772 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe
> mode is ON.
> Resources are low on NN. Please add or free up more resourcesthen turn off
> safe mode manually. NOTE: If you turn off safe mode before adding resources,
> the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode
> leave" to turn safe mode off.
> {code}
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create
> directory /hdfsapi/test. Name node is in safe mode.
> Resources are low on NN. Please add or free up more resourcesthen turn off
> safe mode manually. NOTE: If you turn off safe mode before adding resources,
> the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode
> leave" to turn safe mode off. NamenodeHostName:192.168.1.167
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1468)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1455)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3174)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1145)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:714)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
> at java.base/java.security.AccessController.doPrivileged(Native
> Method)
> at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916){code}
> According to the prompt, it is believed that there is not enough resource
> space to meet the corresponding conditions to close safe mode, but after
> adding or releasing more resources and lowering the resource condition
> threshold "dfs.namenode.resource.du.reserved", it still fails to close safe
> mode and throws the same prompt .
> According to the source code, we know that if the NameNode has redundant
> storage volumes less than the "dfs.namenode.resource.checked.volumes.minimum"
> set the minimum number of redundant storage volumes will enter safe mode.
> After debugging, *we found that the current NameNode storage volumes are
> abundant resource space, but because the total number of NameNode storage
> volumes is less than the set value, so the number of NameNode storage volumes
> with redundancy space must also be less than the set value, resulting in
> always entering safe mode.*
> In summary, it is found that the configuration item lacks a condition check
> and an associated exception handling mechanism, which makes it impossible to
> find the root cause of the impact when a misconfiguration occurs.
> The solution I propose is to add a mechanism to check the value of this
> configuration item, it will printing a warning message in the log when the
> value is greater than the number of NameNode storage volumes in order to
> solve the problem in time and avoid the misconfiguration from affecting the
> subsequent operations of the program.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]