[
https://issues.apache.org/jira/browse/HDFS-17223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785987#comment-17785987
]
ASF GitHub Bot commented on HDFS-17223:
---------------------------------------
xinglin commented on code in PR #6183:
URL: https://github.com/apache/hadoop/pull/6183#discussion_r1392944771
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java:
##########
@@ -406,21 +421,39 @@ private void recoverUnclosedSegment(long segmentTxId)
throws IOException {
logToSync.getStartTxId(),
logToSync.getEndTxId()));
}
-
- static List<AsyncLogger> createLoggers(Configuration conf,
+
+ List<AsyncLogger> createLoggers(Configuration conf,
+ URI uri,
+ NamespaceInfo nsInfo,
+ AsyncLogger.Factory factory,
+ String nameServiceId)
+ throws IOException {
+ String[] skipNodesHostPort = conf.getTrimmedStrings(
+ DFS_JOURNALNODE_MAINTENANCE_NODES_KEY,
DFS_JOURNALNODE_MAINTENANCE_NODES_DEFAULT);
+ return createLoggers(conf, uri, nsInfo, factory, nameServiceId,
skipNodesHostPort);
+ }
+
+ private List<AsyncLogger> createLoggers(Configuration conf,
URI uri,
NamespaceInfo nsInfo,
AsyncLogger.Factory factory,
- String nameServiceId)
+ String nameServiceId,
+ String[] skipNodesHostPort)
throws IOException {
List<AsyncLogger> ret = Lists.newArrayList();
List<InetSocketAddress> addrs = Util.getAddressesList(uri, conf);
if (addrs.size() % 2 == 0) {
LOG.warn("Quorum journal URI '" + uri + "' has an even number " +
"of Journal Nodes specified. This is not recommended!");
}
+ setQuorumJournalCount(addrs.size());
+ HostSet skipSet = DFSUtil.getHostSet(skipNodesHostPort);
String jid = parseJournalId(uri);
for (InetSocketAddress addr : addrs) {
+ if(skipSet.match(addr)) {
+ LOG.info("The node {} is a maintenance node and will skip
initialization.", addr);
Review Comment:
nit: "will skip initialization" -> "will be skipped"
> Add journalnode maintenance node list
> -------------------------------------
>
> Key: HDFS-17223
> URL: https://issues.apache.org/jira/browse/HDFS-17223
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: qjm
> Affects Versions: 3.3.6
> Reporter: kuper
> Priority: Major
> Labels: pull-request-available
>
> * In the case of configuring 3 journal nodes in HDFS, if only 2 journal nodes
> are available and 1 journal node fails to start due to machine issues, it
> will result in a long initialization time for the namenode (around 30-40
> minutes, depending on the IPC timeout and retry policy configuration).
> * The failed journal node cannot recover immediately, but HDFS can still
> function in this situation. In our production environment, we encountered
> this issue and had to reduce the IPC timeout and adjust the retry policy to
> accelerate the namenode initialization and provide services.
> * I'm wondering if it would be possible to have a journal node maintenance
> list to speed up the namenode initialization knowing that one journal node
> cannot provide services in advance?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]