[
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885107#comment-16885107
]
Xudong Cao edited comment on HDFS-14646 at 7/15/19 11:38 AM:
-------------------------------------------------------------
*Test Result:*
in a 3 nodes HDFS: ubuntu1 (ANN) +ubuntu2 (SNN) + ubuntu3(SNN), the uploading
log in ubuntu2 and ubuntu3 is as follows:
1. SNN ubuntu2:
{code:java}
root@ubuntu2:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded"
hadoop-root-namenode-ubuntu2.log
2019-07-16 01:52:24,801 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 9573 to namenode at http://ubuntu1:9870 in 0.178 seconds
2019-07-16 01:53:24,912 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 9759 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 01:54:25,051 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 9777 to namenode at http://ubuntu1:9870 in 0.075 seconds
2019-07-16 01:55:25,147 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 9961 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 01:56:25,253 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 9981 to namenode at http://ubuntu1:9870 in 0.054 seconds
2019-07-16 01:57:25,323 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 10171 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 01:58:25,388 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 10191 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 01:59:25,479 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 10383 to namenode at http://ubuntu1:9870 in 0.046 seconds{code}
2. another SNN ubuntu3:
{code:java}
root@ubuntu3:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded"
hadoop-root-namenode-ubuntu3.log
2019-07-16 02:00:34,767 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 10401 to namenode at http://ubuntu1:9870 in 0.028 seconds
2019-07-16 02:02:34,851 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 10603 to namenode at http://ubuntu1:9870 in 0.03 seconds
2019-07-16 02:04:34,938 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 10807 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:06:35,021 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 11013 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 02:08:35,094 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 11217 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:10:35,200 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 11423 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 02:12:35,285 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 11629 to namenode at http://ubuntu1:9870 in 0.026 seconds
2019-07-16 02:14:35,357 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 11835 to namenode at http://ubuntu1:9870 in 0.023 seconds
2019-07-16 02:16:35,442 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 12035 to namenode at http://ubuntu1:9870 in 0.042 seconds
2019-07-16 02:18:35,515 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 12233 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 02:20:35,605 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 12441 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:22:35,675 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 12647 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:24:35,771 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 12853 to namenode at http://ubuntu1:9870 in 0.041 seconds{code}
was (Author: xudongcao):
*Test Result:*
in a 3 nodes HDFS, ubuntu1 (ANN)、ubuntu2 (SNN) and ubuntu3(SNN), the uploading
log in ubuntu2 and ubuntu3 is as follows:
> Standby NameNode should terminate the FsImage put process immediately if the
> peer NN is not in the appropriate state to receive an image.
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 3.1.2
> Reporter: Xudong Cao
> Assignee: Xudong Cao
> Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png,
> largeSendQ.png
>
>
> *Problem Description:*
> In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put
> the image to all other NNs (whether the peer NN is an ANN or not), and even
> if the peer NN immediately replies with an error (such as
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put
> process immediately, but will put the FsImage completely to the peer NN, and
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about
> 30GB. In this case, this invalid put brings two problems:
> # Wasting time and bandwidth.
> # Since the ImageServlet of the peer NN no longer receives the FsImage, the
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will
> be blocked in writing socket for a long time, eventually causing the local
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
> In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE
> error immediately. In this case, the local SNN should terminate put
> immediately, but in fact, local SNN has to wait until the image has been
> completely put to the peer NN,and then can read the response.
> # At this time, since the ImageServlet of the peer NN no longer receives the
> FsImage, the socket Send-Q of the local SNN is very large:
> !largeSendQ.png!
> 2. Moreover, the local SNN's ImageUpload thread will be blocked in
> writing socket for a long time:
> !blockedInWritingSocket.png! .
>
> 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting
> for the execution result of the ImageUpload thread, blocking in Future.get(),
> and the blocking time may be as long as several hours:
> !get1.png!
>
> !get2.png!
>
>
> *Solution:*
> When the local SNN plans to put a FsImage to the peer NN, it need to test
> whether he really need to put it at this time. The test process is:
> # Establish an HTTP connection with the peer NN, send the put request, and
> then immediately read the response (this is the key point). If the peer NN
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE,
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
> # If the peer NN is indeed the Active NameNode AND it's now in the
> appropriate state to receive an image, it will reply an HTTP response 410
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At
> this time, the local SNN can really begin to put the image.
> *Note:*
> This problem needs to be reproduced in a large cluster (the size of FsImage
> in our cluster is about 30GB). Therefore, unit testing is difficult to write.
> In our cluster, after the modification, the problem has been solved and there
> is no such thing as a large backlog of Send-Q.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]