[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885107#comment-16885107
 ] 

Xudong Cao edited comment on HDFS-14646 at 7/15/19 11:38 AM:
-------------------------------------------------------------

*Test Result:*

in a 3 nodes HDFS: ubuntu1 (ANN) +ubuntu2 (SNN) + ubuntu3(SNN), the uploading 
log in ubuntu2 and ubuntu3 is as follows:

1. SNN ubuntu2:
{code:java}
root@ubuntu2:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" 
hadoop-root-namenode-ubuntu2.log 
2019-07-16 01:52:24,801 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9573 to namenode at http://ubuntu1:9870 in 0.178 seconds
2019-07-16 01:53:24,912 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9759 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 01:54:25,051 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9777 to namenode at http://ubuntu1:9870 in 0.075 seconds
2019-07-16 01:55:25,147 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9961 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 01:56:25,253 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9981 to namenode at http://ubuntu1:9870 in 0.054 seconds
2019-07-16 01:57:25,323 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10171 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 01:58:25,388 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10191 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 01:59:25,479 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10383 to namenode at http://ubuntu1:9870 in 0.046 seconds{code}
 

2. another SNN ubuntu3:
{code:java}
root@ubuntu3:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" 
hadoop-root-namenode-ubuntu3.log 
2019-07-16 02:00:34,767 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10401 to namenode at http://ubuntu1:9870 in 0.028 seconds
2019-07-16 02:02:34,851 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10603 to namenode at http://ubuntu1:9870 in 0.03 seconds
2019-07-16 02:04:34,938 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10807 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:06:35,021 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11013 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 02:08:35,094 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11217 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:10:35,200 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11423 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 02:12:35,285 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11629 to namenode at http://ubuntu1:9870 in 0.026 seconds
2019-07-16 02:14:35,357 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11835 to namenode at http://ubuntu1:9870 in 0.023 seconds
2019-07-16 02:16:35,442 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12035 to namenode at http://ubuntu1:9870 in 0.042 seconds
2019-07-16 02:18:35,515 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12233 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 02:20:35,605 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12441 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:22:35,675 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12647 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:24:35,771 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12853 to namenode at http://ubuntu1:9870 in 0.041 seconds{code}


was (Author: xudongcao):
*Test Result:*

in a 3 nodes HDFS, ubuntu1 (ANN)、ubuntu2 (SNN) and ubuntu3(SNN), the uploading 
log in ubuntu2 and ubuntu3 is as follows:

 

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14646
>                 URL: https://issues.apache.org/jira/browse/HDFS-14646
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.1.2
>            Reporter: Xudong Cao
>            Assignee: Xudong Cao
>            Priority: Major
>         Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN,and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to