Re: [PR] HDFS-17223. Add journalnode maintenance node list [hadoop]

via GitHub Tue, 14 Nov 2023 09:38:17 -0800


xinglin commented on code in PR #6183:
URL: https://github.com/apache/hadoop/pull/6183#discussion_r1392945651



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java:
##########
@@ -667,6 +700,9 @@ AsyncLoggerSet getLoggerSetForTests() {
 
   @Override
   public void doPreUpgrade() throws IOException {
+    if (isEnableJnMaintenance()) {
+      throw new IOException("doPreUpgrade() does not support enabling jn 
maintenance mode");

Review Comment:
   nit: doPreUpgrade() does not support enabling jn maintenance mode -> 
doPreUpgrade() is not support while in jn maintenance mode



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java:
##########
@@ -684,6 +720,9 @@ public void doPreUpgrade() throws IOException {
 
   @Override
   public void doUpgrade(Storage storage) throws IOException {
+    if (isEnableJnMaintenance()) {
+      throw new IOException("doUpgrade() does not support enabling jn 
maintenance mode");

Review Comment:
   same here.



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml:
##########
@@ -6333,6 +6333,20 @@
     </description>
   </property>
 
+  <property>
+    <name>dfs.journalnode.maintenance.nodes</name>
+    <value></value>
+    <description>
+      In the case of one out of three journal nodes being down, theoretically 
the service can still
+      continue. However, in reality, the downed node may not recover quickly. 
If the Namenode needs
+      to be restarted, it will try the downed journal node through the lengthy 
RPC retry mechanism,
+      resulting in a long initialization time for the Namenode to provide 
services. By adding the
+      downed journal node to the maintenance nodes, the initialization time of 
the Namenode in such
+      scenarios can be accelerated.

Review Comment:
   nit
   
   ->
   
    In the case that one out of three journal nodes being down, theoretically 
HDFS can still
         function. However, in reality, the unavailable journal node may not 
recover quickly. During this period, when we need to restart an Namenode, the 
Namenode will try to connect to the unavailable journal node through the 
lengthy RPC retry mechanism,
         resulting in a long initialization time for the Namenode. By adding 
these
         unavailable journal nodes to the maintenance nodes, we will skip these 
unavailable journal nodes during Namenode initialization and thus reduce 
namenode startup time.
   
   1-node example values: <>
   2-node example values: <>



##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java:
##########
@@ -1137,4 +1139,24 @@ public void testAddTransferRateMetricForInvalidValue() {
     DFSUtil.addTransferRateMetric(mockMetrics, 100, 0);
     verify(mockMetrics, times(0)).addReadTransferRate(anyLong());
   }
+
+  @Test
+  public void testGetHostSet() {
+    String[] testAddrs = new String[] {NS1_NN_ADDR, NS1_NN1_ADDR};

Review Comment:
   this test case is a bit confusing. Can we just use 
"unreachable-host1.com:9000" instead?



##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java:
##########
@@ -1171,6 +1176,21 @@ public void testSelectViaRpcAfterJNRestart() throws 
Exception {
     }
   }
 
+  /**
+   * Tests to throw an exception if the jn maintenance nodes exceeds half of 
the journalnode number.
+   */
+  @Test
+  public void testJNMaintenanceListViaRpcTwoJNsError() throws Exception {
+    StringJoiner maintenanceListBuff = new StringJoiner(",");
+    for (int i = 0; i < 2; i++) {
+      maintenanceListBuff.add(
+          
NetUtils.getHostPortString(cluster.getJournalNode(i).getBoundIpcAddress()));
+    }
+    this.conf.set(DFS_JOURNALNODE_MAINTENANCE_NODES_KEY, 
maintenanceListBuff.toString());
+    assertThrows(IllegalArgumentException.class, () -> createSpyingQJM());
+  }
+
+

Review Comment:
   I assume we also need a test case to to verify maintenance journal nodes are 
indeed excluded? total JN=3, with 1 is excluded. after we initialize, verify 
only 2 JNs are included.



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java:
##########
@@ -701,6 +740,9 @@ public void doUpgrade(Storage storage) throws IOException {
   
   @Override
   public void doFinalize() throws IOException {
+    if (isEnableJnMaintenance()) {
+      throw new IOException("doFinalize() does not support enabling jn 
maintenance mode");

Review Comment:
   same here. and for others functions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDFS-17223. Add journalnode maintenance node list [hadoop]

Reply via email to