[jira] [Commented] (HDFS-17223) Add journalnode maintenance node list

ASF GitHub Bot (Jira) Tue, 14 Nov 2023 09:12:08 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-17223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785987#comment-17785987
 ]


ASF GitHub Bot commented on HDFS-17223:
---------------------------------------

xinglin commented on code in PR #6183:
URL: https://github.com/apache/hadoop/pull/6183#discussion_r1392944771


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java:
##########
@@ -406,21 +421,39 @@ private void recoverUnclosedSegment(long segmentTxId) 
throws IOException {
             logToSync.getStartTxId(),
             logToSync.getEndTxId()));
   }
-  
-  static List<AsyncLogger> createLoggers(Configuration conf,
+
+  List<AsyncLogger> createLoggers(Configuration conf,
+                                  URI uri,
+                                  NamespaceInfo nsInfo,
+                                  AsyncLogger.Factory factory,
+                                  String nameServiceId)
+      throws IOException {
+    String[] skipNodesHostPort = conf.getTrimmedStrings(
+        DFS_JOURNALNODE_MAINTENANCE_NODES_KEY, 
DFS_JOURNALNODE_MAINTENANCE_NODES_DEFAULT);
+    return createLoggers(conf, uri, nsInfo, factory, nameServiceId, 
skipNodesHostPort);
+  }
+
+  private List<AsyncLogger> createLoggers(Configuration conf,
                                          URI uri,
                                          NamespaceInfo nsInfo,
                                          AsyncLogger.Factory factory,
-                                         String nameServiceId)
+                                         String nameServiceId,
+                                         String[] skipNodesHostPort)
       throws IOException {
     List<AsyncLogger> ret = Lists.newArrayList();
     List<InetSocketAddress> addrs = Util.getAddressesList(uri, conf);
     if (addrs.size() % 2 == 0) {
       LOG.warn("Quorum journal URI '" + uri + "' has an even number " +
           "of Journal Nodes specified. This is not recommended!");
     }
+    setQuorumJournalCount(addrs.size());
+    HostSet skipSet = DFSUtil.getHostSet(skipNodesHostPort);
     String jid = parseJournalId(uri);
     for (InetSocketAddress addr : addrs) {
+      if(skipSet.match(addr)) {
+        LOG.info("The node {} is a maintenance node and will skip 
initialization.", addr);

Review Comment:
   nit: "will skip initialization" -> "will be skipped"





> Add journalnode maintenance node list
> -------------------------------------
>
>                 Key: HDFS-17223
>                 URL: https://issues.apache.org/jira/browse/HDFS-17223
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: qjm
>    Affects Versions: 3.3.6
>            Reporter: kuper
>            Priority: Major
>              Labels: pull-request-available
>
> * In the case of configuring 3 journal nodes in HDFS, if only 2 journal nodes 
> are available and 1 journal node fails to start due to machine issues, it 
> will result in a long initialization time for the namenode (around 30-40 
> minutes, depending on the IPC timeout and retry policy configuration). 
> * The failed journal node cannot recover immediately, but HDFS can still 
> function in this situation. In our production environment, we encountered 
> this issue and had to reduce the IPC timeout and adjust the retry policy to 
> accelerate the namenode initialization and provide services. 
> * I'm wondering if it would be possible to have a journal node maintenance 
> list to speed up the namenode initialization knowing that one journal node 
> cannot provide services in advance?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17223) Add journalnode maintenance node list

Reply via email to