caozhiqiang created HDFS-17818: ---------------------------------- Summary: Fix serial fsimage transfer during checkpoint with multiple namenodes Key: HDFS-17818 URL: https://issues.apache.org/jira/browse/HDFS-17818 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.5.0 Reporter: caozhiqiang Assignee: caozhiqiang
In our cluster, each namespace has four NameNodes: one active, one standby, and two observers. When the standby NameNode performs a checkpoint, it transfer the fsimage to the other three NameNodes. However, we found that these transfer are performed serially. The reason is that the corePoolSize in ThreadPoolExecutor is 0, and the transfer task does not fill the LinkedBlockingQueue, resulting in only one thread transfer the fsimage at a time. This greatly increases the checkpoint time. {code:java} ExecutorService executor = new ThreadPoolExecutor(0, activeNNAddresses.size(), 100, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(activeNNAddresses.size()), uploadThreadFactory); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org