[ 
https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824329#comment-17824329
 ] 

ruiliang edited comment on HDFS-17407 at 3/7/24 9:29 AM:
---------------------------------------------------------

After analyzing the log and source code, it is because the two sbnn initiated 
Checkpoint at the same time. When the latter checked the file flow, it found 
that the file had been updated and threw an exception. Should not output as an 
exception?

SbNN 1 log

 
{code:java}
root@cluster06-yynn1:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-cluster06-nn1.xx.com.log 
2024-03-07 16:48:00,061 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4afc4056 
expecting start txid #57258734311
2024-03-07 16:48:00,061 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:02,592 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(162)) - Edits file 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 of size 35380849 edits # 214398 loaded in 2 seconds {code}
SbNN 2 log

 
{code:java}
root@cluster06-yynn3:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-cluster06-nn3.xx.com.log
2024-03-07 16:48:32,536 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6d0659cd 
expecting start txid #57258734311
2024-03-07 16:48:32,536 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.xxcom:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:35,634 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(162)) - Edits file 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 of size 35380849 edits # 214398 loaded in 3 seconds 
...

2024-03-07 16:48:32,547 INFO  namenode.TransferFsImage 
(TransferFsImage.java:copyFileToStream(394)) - Sending fileName: 
/data/hadoop/hdfs/namenode/current/fsimage_0000000057258734310, fileSize: 
4811881207. Sent total: 2228224 bytes. Size of last segment intended to send: 
131072 bytes.
java.io.IOException: Error writing request body to server
        at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
        at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:376)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:320)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:236)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:231)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2024-03-07 16:48:33,051 ERROR ha.StandbyCheckpointer 
(StandbyCheckpointer.java:doWork(452)) - Exception in doCheckpoint
java.io.IOException: Exception during image upload
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:257)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1500(StandbyCheckpointer.java:62)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:432)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:331)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:351)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:347)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error 
writing request body to server
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:250)
        ... 9 more
Caused by: java.io.IOException: Error writing request body to server
        at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
        at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:376)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:320)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:236)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:231)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748){code}
 

 

All started start txid #57258734311 

java.io.IOException: Error writing request body to server    Be from

 
{code:java}
void checkError () throws IOException {
    if (closed) {
        throw new IOException ("Stream is closed");
    }
    if (error) {
        throw errorExcp;
    }
    if (((PrintStream)out).checkError()) {
        throw new IOException("Error writing request body to server");
    }
}

....

private void ensureOpen() throws IOException {
    if (out == null)
        throw new IOException("Stream closed");
}

/**
 * Flushes the stream.  This is done by writing any buffered output bytes to
 * the underlying output stream and then flushing that stream.
 *
 * @see        java.io.OutputStream#flush()
 */
public void flush() {
    synchronized (this) {
        try {
            ensureOpen();
            out.flush();
        }
        catch (IOException x) {
            trouble = true;
        }
    }
} {code}
 

 

 


was (Author: ruilaing):
After analyzing the log and source code, it is because the two sbnn initiated 
Checkpoint at the same time. When the latter checked the file flow, it found 
that the file had been updated and threw an exception. Should not output as an 
exception?

SbNN 1 log

 
{code:java}
root@cluster06-yynn1:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-cluster06-yynn1.xx.com.log 
2024-03-07 16:48:00,061 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4afc4056 
expecting start txid #57258734311
2024-03-07 16:48:00,061 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:02,592 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(162)) - Edits file 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 of size 35380849 edits # 214398 loaded in 2 seconds {code}
SbNN 2 log

 
{code:java}
root@cluster06-yynn3:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-cluster06-yynn3.xx.com.log
2024-03-07 16:48:32,536 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6d0659cd 
expecting start txid #57258734311
2024-03-07 16:48:32,536 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:35,634 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(162)) - Edits file 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 of size 35380849 edits # 214398 loaded in 3 seconds 
...

2024-03-07 16:48:32,547 INFO  namenode.TransferFsImage 
(TransferFsImage.java:copyFileToStream(394)) - Sending fileName: 
/data/hadoop/hdfs/namenode/current/fsimage_0000000057258734310, fileSize: 
4811881207. Sent total: 2228224 bytes. Size of last segment intended to send: 
131072 bytes.
java.io.IOException: Error writing request body to server
        at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
        at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:376)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:320)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:236)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:231)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2024-03-07 16:48:33,051 ERROR ha.StandbyCheckpointer 
(StandbyCheckpointer.java:doWork(452)) - Exception in doCheckpoint
java.io.IOException: Exception during image upload
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:257)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1500(StandbyCheckpointer.java:62)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:432)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:331)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:351)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:347)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Error 
writing request body to server
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:250)
        ... 9 more
Caused by: java.io.IOException: Error writing request body to server
        at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
        at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:376)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:320)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
        at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:236)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:231)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748){code}
 

 

All started start txid #57258734311 

java.io.IOException: Error writing request body to server    Be from

 
{code:java}
void checkError () throws IOException {
    if (closed) {
        throw new IOException ("Stream is closed");
    }
    if (error) {
        throw errorExcp;
    }
    if (((PrintStream)out).checkError()) {
        throw new IOException("Error writing request body to server");
    }
}

....

private void ensureOpen() throws IOException {
    if (out == null)
        throw new IOException("Stream closed");
}

/**
 * Flushes the stream.  This is done by writing any buffered output bytes to
 * the underlying output stream and then flushing that stream.
 *
 * @see        java.io.OutputStream#flush()
 */
public void flush() {
    synchronized (this) {
        try {
            ensureOpen();
            out.flush();
        }
        catch (IOException x) {
            trouble = true;
        }
    }
} {code}
 

 

 

> Exception during image upload
> -----------------------------
>
>                 Key: HDFS-17407
>                 URL: https://issues.apache.org/jira/browse/HDFS-17407
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>    Affects Versions: 3.1.0
>         Environment: hadoop 3.1.0 
> linux:ubuntu 16.04
> ambari-hdp:3.1.1
>            Reporter: ruiliang
>            Priority: Major
>
> After I added the third hdfs namenode, the service was fine. However, the two 
> Standby namenode service logs always show exceptions during image upload. 
> However, I observe that the image file of the primary node is being updated 
> normally, which indicates that the secondary node has merged the image file 
> and uploaded it to the primary node. But I don't understand why two Standby 
> namenode keep sending such exception logs. Are there potential risk issues?
>  
> namenode log 
> {code:java}
> 2024-03-01 15:31:46,162 INFO  namenode.TransferFsImage 
> (TransferFsImage.java:copyFileToStream(394)) - Sending fileName: 
> /data/hadoop/hdfs/namenode/current/fsimage_0000000055689095810, fileSize: 
> 4626167848. Sent total: 1703936 bytes. Size of last segment intended to send: 
> 131072 bytes.
> java.io.IOException: Error writing request body to server
>         at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>         at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:376)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:320)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:236)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:231)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2024-03-01 15:31:46,630 INFO  blockmanagement.BlockManager 
> (BlockManager.java:enqueue(4923)) - Block report queue is full
> 2024-03-01 15:31:46,664 ERROR ha.StandbyCheckpointer 
> (StandbyCheckpointer.java:doWork(452)) - Exception in doCheckpoint
> java.io.IOException: Exception during image upload
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:257)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1500(StandbyCheckpointer.java:62)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:432)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:331)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:351)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:347)
> Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
> Error writing request body to server
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:250)
>         ... 9 more
> Caused by: java.io.IOException: Error writing request body to server
>         at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>         at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:376)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:320)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:236)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:231)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748) {code}
>  
>  
> Cluster change operation Follow the following documents  add mycluster.nn3 
> Related configuration
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to