[jira] [Comment Edited] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

Xudong Cao (JIRA) Mon, 15 Jul 2019 04:43:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885107#comment-16885107
 ]


Xudong Cao edited comment on HDFS-14646 at 7/15/19 11:38 AM:
-------------------------------------------------------------

*Test Result:*

in a 3 nodes HDFS: ubuntu1 (ANN) +ubuntu2 (SNN) + ubuntu3(SNN), the uploading 
log in ubuntu2 and ubuntu3 is as follows:

1. SNN ubuntu2:
{code:java}
root@ubuntu2:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" 
hadoop-root-namenode-ubuntu2.log 
2019-07-16 01:52:24,801 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9573 to namenode at http://ubuntu1:9870 in 0.178 seconds
2019-07-16 01:53:24,912 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9759 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 01:54:25,051 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9777 to namenode at http://ubuntu1:9870 in 0.075 seconds
2019-07-16 01:55:25,147 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9961 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 01:56:25,253 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9981 to namenode at http://ubuntu1:9870 in 0.054 seconds
2019-07-16 01:57:25,323 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10171 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 01:58:25,388 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10191 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 01:59:25,479 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10383 to namenode at http://ubuntu1:9870 in 0.046 seconds{code}
 

2. another SNN ubuntu3:
{code:java}
root@ubuntu3:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" 
hadoop-root-namenode-ubuntu3.log 
2019-07-16 02:00:34,767 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10401 to namenode at http://ubuntu1:9870 in 0.028 seconds
2019-07-16 02:02:34,851 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10603 to namenode at http://ubuntu1:9870 in 0.03 seconds
2019-07-16 02:04:34,938 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10807 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:06:35,021 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11013 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 02:08:35,094 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11217 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:10:35,200 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11423 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 02:12:35,285 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11629 to namenode at http://ubuntu1:9870 in 0.026 seconds
2019-07-16 02:14:35,357 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11835 to namenode at http://ubuntu1:9870 in 0.023 seconds
2019-07-16 02:16:35,442 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12035 to namenode at http://ubuntu1:9870 in 0.042 seconds
2019-07-16 02:18:35,515 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12233 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 02:20:35,605 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12441 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:22:35,675 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12647 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:24:35,771 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12853 to namenode at http://ubuntu1:9870 in 0.041 seconds{code}


was (Author: xudongcao):
*Test Result:*

in a 3 nodes HDFS, ubuntu1 (ANN)、ubuntu2 (SNN) and ubuntu3(SNN), the uploading 
log in ubuntu2 and ubuntu3 is as follows:

 

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14646
>                 URL: https://issues.apache.org/jira/browse/HDFS-14646
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.1.2
>            Reporter: Xudong Cao
>            Assignee: Xudong Cao
>            Priority: Major
>         Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN，and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

Reply via email to