[
https://issues.apache.org/jira/browse/HDFS-17818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18041246#comment-18041246
]
ASF GitHub Bot commented on HDFS-17818:
---------------------------------------
tomscut merged PR #7862:
URL: https://github.com/apache/hadoop/pull/7862
> Fix serial fsimage transfer during checkpoint with multiple namenodes
> ---------------------------------------------------------------------
>
> Key: HDFS-17818
> URL: https://issues.apache.org/jira/browse/HDFS-17818
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.5.0
> Reporter: caozhiqiang
> Assignee: caozhiqiang
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2025-10-14-11-24-46-741.png
>
>
> In our cluster, each namespace has four NameNodes: one active, one standby,
> and two observers. When the standby NameNode performs a checkpoint, it
> transfer the fsimage to the other three NameNodes. However, we found that
> these transfer are performed serially.
> The reason is that the corePoolSize in ThreadPoolExecutor is 0, and the
> transfer task does not fill the LinkedBlockingQueue, resulting in only one
> thread transfer the fsimage at a time. This greatly increases the checkpoint
> time.
> {code:java}
> ExecutorService executor = new ThreadPoolExecutor(0,
> activeNNAddresses.size(), 100,
> TimeUnit.MILLISECONDS, new
> LinkedBlockingQueue<Runnable>(activeNNAddresses.size()),
> uploadThreadFactory); {code}
> !image-2025-10-14-11-24-46-741.png|width=554,height=142!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]