[jira] [Commented] (HDFS-17223) Add journalnode maintenance node list

ASF GitHub Bot (Jira) Mon, 27 Nov 2023 05:20:06 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-17223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790079#comment-17790079
 ]


ASF GitHub Bot commented on HDFS-17223:
---------------------------------------

Hexiaoqiao commented on PR #6183:
URL: https://github.com/apache/hadoop/pull/6183#issuecomment-1827820008

   Thanks @gp1314 and @xinglin for your works. I am not very sure to get the 
total purpose here.
   
   > In the case of configuring 3 journal nodes in HDFS, if only 2 journal 
nodes are available and 1 journal node fails to start due to machine issues, it 
will result in a long initialization time for the namenode (around 30-40 
minutes, depending on the IPC timeout and retry policy configuration).
   
   Do you mean that NameNode restart will cost extra over 30~40 minutes while 
1/3 JN could not be available? It is interesting where it costs? IIUC, The 
majority JN work well, it will connect and interact well.
   
   > The failed journal node cannot recover immediately, but HDFS can still 
function in this situation. In our production environment, we encountered this 
issue and had to reduce the IPC timeout and adjust the retry policy to 
accelerate the namenode initialization and provide services.
   
   I used to maintain JNs online one by one, but didn't meet timeout at 
NameNode side. Not sure what different between them, one point is the version 
could have some differences(our version is based on 2.7.1 with some internal 
improvement.)
   
   Thanks again. Please correct me if I missed something.




> Add journalnode maintenance node list
> -------------------------------------
>
>                 Key: HDFS-17223
>                 URL: https://issues.apache.org/jira/browse/HDFS-17223
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: qjm
>    Affects Versions: 3.3.6
>            Reporter: kuper
>            Priority: Major
>              Labels: pull-request-available
>
> * In the case of configuring 3 journal nodes in HDFS, if only 2 journal nodes 
> are available and 1 journal node fails to start due to machine issues, it 
> will result in a long initialization time for the namenode (around 30-40 
> minutes, depending on the IPC timeout and retry policy configuration). 
> * The failed journal node cannot recover immediately, but HDFS can still 
> function in this situation. In our production environment, we encountered 
> this issue and had to reduce the IPC timeout and adjust the retry policy to 
> accelerate the namenode initialization and provide services. 
> * I'm wondering if it would be possible to have a journal node maintenance 
> list to speed up the namenode initialization knowing that one journal node 
> cannot provide services in advance?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17223) Add journalnode maintenance node list

Reply via email to