[
https://issues.apache.org/jira/browse/HDFS-16547?focusedWorklogId=765117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765117
]
ASF GitHub Bot logged work on HDFS-16547:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 02/May/22 20:16
Start Date: 02/May/22 20:16
Worklog Time Spent: 10m
Work Description: xkrogen commented on code in PR #4201:
URL: https://github.com/apache/hadoop/pull/4201#discussion_r863153584
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##########
@@ -1899,6 +1899,10 @@ synchronized void transitionToStandby() throws
IOException {
synchronized void transitionToObserver() throws IOException {
String operationName = "transitionToObserver";
namesystem.checkSuperuserPrivilege(operationName);
+ if (namesystem.isInSafeMode()) {
Review Comment:
I think we can guard this by `notBecomeActiveInSafemode`. Though the config
claims to be about "active" status, the logic in `monitorHealth` just generally
considers a standby NN as unhealthy if it's in safemode, and I think the intent
here is the same as with that config.
Also: `namesystem.isInSafeMode()` -> `isInSafeMode()` (`NameNode` redefines
this method)
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSZKFailoverController.java:
##########
@@ -301,6 +304,40 @@ public void testManualFailoverWithDFSHAAdmin() throws
Exception {
waitForHAState(1, HAServiceState.STANDBY);
}
+ /**
+ * Tests that a Namenode in safe mode should not be transfer to observer
state.
+ */
+ @Test
+ public void testManualFailoverWithDFSHAAdminInSafemode() throws Exception {
+ startCluster();
+ NamenodeProtocols nn1 = cluster.getNameNode(1).getRpcServer();
+
+ // Enter safe mode.
+ nn1.setSafeMode(HdfsConstants.SafeModeAction.SAFEMODE_ENTER, false);
+ // Test NameNodeRpcServer.
+ LambdaTestUtils.intercept(SafeModeException.class,
+ "Cannot transition to observer. Name node is in safe mode",
+ () -> nn1.transitionToObserver(
+ new StateChangeRequestInfo(RequestSource.REQUEST_BY_USER_FORCED)));
+
+ // Test DFSHAAdmin.
+ DFSHAAdmin tool = new DFSHAAdmin();
+ tool.setConf(conf);
+ System.setIn(new ByteArrayInputStream("yes\n".getBytes()));
+ int result = tool.run(
+ new String[]{"-transitionToObserver", "-forcemanual", "nn2"});
+ assertEquals("State transition returned: " + result, -1, result);
Review Comment:
This should be in `TestDFSHAAdminMiniCluster`
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSZKFailoverController.java:
##########
@@ -301,6 +304,40 @@ public void testManualFailoverWithDFSHAAdmin() throws
Exception {
waitForHAState(1, HAServiceState.STANDBY);
}
+ /**
+ * Tests that a Namenode in safe mode should not be transfer to observer
state.
+ */
+ @Test
+ public void testManualFailoverWithDFSHAAdminInSafemode() throws Exception {
+ startCluster();
+ NamenodeProtocols nn1 = cluster.getNameNode(1).getRpcServer();
+
+ // Enter safe mode.
+ nn1.setSafeMode(HdfsConstants.SafeModeAction.SAFEMODE_ENTER, false);
+ // Test NameNodeRpcServer.
+ LambdaTestUtils.intercept(SafeModeException.class,
+ "Cannot transition to observer. Name node is in safe mode",
+ () -> nn1.transitionToObserver(
+ new StateChangeRequestInfo(RequestSource.REQUEST_BY_USER_FORCED)));
Review Comment:
This should probably be in `TestHASafeMode`, where we already have
`testTransitionToActiveWhenSafeMode`
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##########
@@ -1899,6 +1899,10 @@ synchronized void transitionToStandby() throws
IOException {
synchronized void transitionToObserver() throws IOException {
String operationName = "transitionToObserver";
namesystem.checkSuperuserPrivilege(operationName);
+ if (namesystem.isInSafeMode()) {
+ throw namesystem.newSafemodeException("Cannot transition to " +
+ OBSERVER_STATE);
Review Comment:
Consolidate this logic with the exception from `transitionToActive`:
```java
if (notBecomeActiveInSafemode && isInSafeMode()) {
throw new ServiceFailedException(getRole() + " still not leave
safemode");
}
```
I don't think we need to make `newSafemodeException` public, seems fine to
just throw a new exception here?
Issue Time Tracking
-------------------
Worklog Id: (was: 765117)
Time Spent: 50m (was: 40m)
> [SBN read] Namenode in safe mode should not be transfered to observer state
> ---------------------------------------------------------------------------
>
> Key: HDFS-16547
> URL: https://issues.apache.org/jira/browse/HDFS-16547
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Tao Li
> Assignee: Tao Li
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Currently, when a Namenode is in safemode(under starting or enter safemode
> manually), we can transfer this Namenode to Observer by command. This
> Observer node may receive many requests and then throw a SafemodeException,
> this causes unnecessary failover on the client.
> So Namenode in safe mode should not be transfer to observer state.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]