[
https://issues.apache.org/jira/browse/HDFS-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866415#comment-17866415
]
ASF GitHub Bot commented on HDFS-16690:
---------------------------------------
Hexiaoqiao commented on code in PR #6925:
URL: https://github.com/apache/hadoop/pull/6925#discussion_r1679405906
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeSyncer.java:
##########
@@ -187,7 +198,15 @@ private void startSyncJournalsDaemon() {
while(shouldSync) {
try {
if (!journal.isFormatted()) {
- LOG.warn("Journal cannot sync. Not formatted.");
+ LOG.warn("Journal cannot sync. Not formatted. Trying to format
with the syncer");
+ formatWithSyncer();
+ if (journal.isFormatted() && !createEditsSyncDir()) {
+ LOG.error("Failed to create directory for downloading log " +
+ "segments: {}. Stopping Journal Node Sync.",
+ journal.getStorage().getEditsSyncDir());
+ return;
+ }
+ continue;
Review Comment:
If it will drop into endless loop when `tryFormatting` set to false by
default here?
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java:
##########
@@ -245,17 +246,24 @@ public GetEditLogManifestResponseProto getEditLogManifest(
String jid, String nameServiceId,
long sinceTxId, boolean inProgressOk)
throws IOException {
-
+
RemoteEditLogManifest manifest = jn.getOrCreateJournal(jid, nameServiceId)
.getEditLogManifest(sinceTxId, inProgressOk);
-
+
return GetEditLogManifestResponseProto.newBuilder()
.setManifest(PBHelper.convert(manifest))
.setHttpPort(jn.getBoundHttpAddress().getPort())
.setFromURL(jn.getHttpServerURI())
.build();
}
+ @Override
+ public StorageInfoProto getStorageInfo(String jid,
+ String nameServiceId) throws IOException {
Review Comment:
`nameServiceId` here is not used anymore, so why we should define it here?
> Automatically format new unformatted JournalNodes using JournalNodeSyncer
> --------------------------------------------------------------------------
>
> Key: HDFS-16690
> URL: https://issues.apache.org/jira/browse/HDFS-16690
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: journal-node
> Affects Versions: 3.4.0, 3.3.5
> Environment: Demonstrated in a Kubernetes environment running Java 11.
> # Start new cluster, but short 1 JN (minimum quorum, and the missing JN
> won’t resolve). VERIFY:
> - NN formats the 2 existing JN and stabilizes. NOTE: Formatting using just
> a quorum will be a separate submission
> - Messages show sync between JN-0 and JN-1, and NN -> JN.
> # Scale JN stateful set to add missing JN. VERIFY:
> - New JN starts
> - All other JN and all NN report IP address change (IP Address resolution).
> NOTE: require HADOOP-18365 and HDFS-16688
> - Messages show sync between all JN, and NN -> JN
> - New JN is formatted at least once (possibly by multiple other JN)
> - New JN storage directory is formatted only once
> - New JN joins cluster (lastWriterEpoch is non-zero)
> Reporter: Steve Vaughan
> Assignee: Steve Vaughan
> Priority: Major
> Labels: pull-request-available
>
> If an unformatted JournalNode is added to an existing JournalNode set,
> instances of the JournalNodeSyncer are unable to sync to the new node. When
> a sync receives a JournalNotFormattedException, we can initiate a format
> operation, and then retry the synchronization.
> Conceptually this means that the JournalNodes and their data can be managed
> independently from the rest of the system, as the JournalNodes will
> incorporate new JournalNode instances. Once the new JournalNode is
> formatted, it can participate in shared edits from the NameNodes.
> I've been testing an update to the InterQJournalProtocol to add a format call
> like that used by the NameNode. Current tests include starting an HA cluster
> from scratch, but with 2 JournalNode instances. Once the cluster is up, I
> can add the 3rd JournalNode (which is unformatted), and the other 2
> JournalNodes will eventually attempt to sync which results in a formatting
> and subsequent sync.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]