[ 
https://issues.apache.org/jira/browse/HDFS-16697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721772#comment-17721772
 ] 

ASF GitHub Bot commented on HDFS-16697:
---------------------------------------

Likkey closed pull request #5569: HDFS-16697.Add code to check for 
minimumRedundantVolumes.
URL: https://github.com/apache/hadoop/pull/5569




> Randomly setting “dfs.namenode.resource.checked.volumes.minimum” will always 
> prevent safe mode from being turned off
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16697
>                 URL: https://issues.apache.org/jira/browse/HDFS-16697
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.1.3
>         Environment: Linux version 4.15.0-142-generic 
> (buildd@lgw01-amd64-039) (gcc version 5.4.0 20160609 (Ubuntu 
> 5.4.0-6ubuntu1~16.04.12))
> java version "1.8.0_162"
> Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
>            Reporter: ECFuzz
>            Assignee: ECFuzz
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {code:java}
> <property>
>   <name>dfs.namenode.resource.checked.volumes.minimum</name>
>   <value>1</value>
>   <description>
>     The minimum number of redundant NameNode storage volumes required.
>   </description>
> </property>{code}
> I found that when setting the value of 
> “dfs.namenode.resource.checked.volumes.minimum” is greater than the total 
> number of storage volumes in the NameNode, it is always impossible to turn 
> off the safe mode, and when in safe mode, the file system only accepts read 
> data requests, but not delete, modify and other change requests, which is 
> greatly limited by the function.
> The default value of the configuration item is 1, we set to 2 as an example 
> for illustration, after starting hdfs logs and the client will throw the 
> relevant reminders.
> {code:java}
> 2022-07-27 17:37:31,772 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Already in safe mode.
> 2022-07-27 17:37:31,772 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.
> Resources are low on NN. Please add or free up more resourcesthen turn off 
> safe mode manually. NOTE:  If you turn off safe mode before adding resources, 
> the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode 
> leave" to turn safe mode off.
> {code}
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create 
> directory /hdfsapi/test. Name node is in safe mode.
> Resources are low on NN. Please add or free up more resourcesthen turn off 
> safe mode manually. NOTE:  If you turn off safe mode before adding resources, 
> the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode 
> leave" to turn safe mode off. NamenodeHostName:192.168.1.167
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1468)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1455)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3174)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1145)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:714)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
>         at java.base/java.security.AccessController.doPrivileged(Native 
> Method)
>         at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916){code}
> According to the prompt, it is believed that there is not enough resource 
> space to meet the corresponding conditions to close safe mode, but after 
> adding or releasing more resources and lowering the resource condition 
> threshold "dfs.namenode.resource.du.reserved", it still fails to close safe 
> mode and throws the same prompt .
> According to the source code, we know that if the NameNode has redundant 
> storage volumes less than the "dfs.namenode.resource.checked.volumes.minimum" 
> set the minimum number of redundant storage volumes will enter safe mode. 
> After debugging, *we found that the current NameNode storage volumes are 
> abundant resource space, but because the total number of NameNode storage 
> volumes is less than the set value, so the number of NameNode storage volumes 
> with redundancy space must also be less than the set value, resulting in 
> always entering safe mode.*
> In summary, it is found that the configuration item lacks a condition check 
> and an associated exception handling mechanism, which makes it impossible to 
> find the root cause of the impact when a misconfiguration occurs.
> The solution I propose is to add a mechanism to check the value of this 
> configuration item, throwing an IllegalArgumentException and a detailed error 
> message when the value is greater than the number of NameNode storage 
> volumes, and printing a warning message in the log to avoid the 
> misconfiguration from affecting the subsequent operations of the program.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to