[
https://issues.apache.org/jira/browse/HDFS-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863001#comment-17863001
]
ASF GitHub Bot commented on HDFS-16690:
---------------------------------------
aswinmprabhu opened a new pull request, #6925:
URL: https://github.com/apache/hadoop/pull/6925
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
If an unformatted JournalNode is added to an existing JournalNode set --
1. JournalNodeSyncer is unable to sync from the other JNs to this new JN
2. Namenode is unable to flush edit logs to the the new JN
This scenario can arise in different situations like
- OS upgrade maintenance of the JN host (with a data disk wipe)
- Moving the JN application to a new host due to h/w issues
- Installing additional JNs (3 -> 5) for better HA during maintenance
operations
Manually fixing this involves rsyncing the VERSION file to the edit log root
directory from the other healthy JNs. A similar issue concerning the paxos
directory was solved in
[HDFS-10659](https://issues.apache.org/jira/browse/HDFS-10659).
This PR tries to leverage the already existing JournalNodeSyncer daemon to
format the JournalNode on which it is running when it discovers that syncs
can't happen due to the `JournalNotFormattedException`. JournalNodeSyncer calls
the `formatWithSyncer` method if it sees that the JN is unformatted.
`formatWithSyncer` will loop over the other JN proxies, trying to fetch the
`StorageInfo` object from them. The StorageInfo object is then used to format
the JN by calling `JNStorage.format()`.
### How was this patch tested?
Unit tests will be added once I get some initial feedback.
**I've tested the changes manually in a K8S cluster with 3 JNs:**
JN root dir before testing:
```
[root@asprabhu-hadoop-hdfs-jn-0 current]# ls
VERSION edits_0000000000000000001-0000000000000000042
last-promised-epoch paxos
committed-txid edits_inprogress_0000000000000000043
last-writer-epoch
[root@asprabhu-hadoop-hdfs-jn-0 current]# pwd
/grid/edits/journal/data/asprabhu-hadoop/current
```
Logs show that JN can receive edit logs: `2024-07-04 08:50:55,242 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file /grid/edits/journal/d
ata/asprabhu-hadoop/current/edits_inprogress_0000000000000000001 ->
/grid/edits/journal/data/asprabhu-hadoop/current/edits_00000000
00000000001-0000000000000000042`
Killed the JN process and deleted the edit logs root dir:
```
[root@asprabhu-hadoop-hdfs-jn-0 data]# kill -9 67
[root@asprabhu-hadoop-hdfs-jn-0 data]# rm -rf *
[root@asprabhu-hadoop-hdfs-jn-0 data]# ls
[root@asprabhu-hadoop-hdfs-jn-0 data]# pwd
/grid/edits/journal/data
```
Started the JN process. The syncer formatted the JN. Some relevant log lines:
```
2024-07-04 09:15:44,437 INFO
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Starting SyncJournal
daemon for journal aspr
abhu-hadoop
2024-07-04 09:15:44,527 INFO
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Trying to format the
journal with the syncer
2024-07-04 09:15:44,639 ERROR
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Got invalid
StorageInfo from
asprabhu-hadoop-hdfs-jn-0.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin.com/100.104.107.156:8485
2024-07-04 09:15:44,723 INFO
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Got StorageInfo
lv=-63;cid=CID-613cdcfb-9a6e-4b3b-a3f6-a33ddc7a6ca5;nsid=613335273;c=1720082854169
from
asprabhu-hadoop-hdfs-jn-1.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin.com/100.98.128.215:8485
2024-07-04 09:15:44,725 INFO org.apache.hadoop.hdfs.qjournal.server.Journal:
Formatting journal id : asprabhu-hadoop with namespace
info:
lv=-63;cid=CID-613cdcfb-9a6e-4b3b-a3f6-a33ddc7a6ca5;nsid=613335273;c=1720082854169;bpid=null
2024-07-04 09:15:44,726 INFO org.apache.hadoop.hdfs.server.common.Storage:
/grid/edits/journal/data/asprabhu-hadoop does not exist.
Creating ...
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1241)
2024-07-04 09:15:44,727 INFO org.apache.hadoop.hdfs.server.common.Storage:
Lock on /grid/edits/journal/data/asprabhu-hadoop/in_use.
lock acquired by nodename
9...@asprabhu-hadoop-hdfs-jn-0.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin
.com
2024-07-04 09:15:44,728 INFO org.apache.hadoop.hdfs.server.common.Storage:
Formatting journal Storage Directory /grid/edits/journal
/data/asprabhu-hadoop with nsid: 613335273
2024-07-04 09:15:44,735 INFO org.apache.hadoop.hdfs.server.common.Storage:
Creating paxos dir: /grid/edits/journal/data/asprabhu-ha
doop/current/paxos
2024-07-04 09:15:44,735 INFO org.apache.hadoop.hdfs.server.common.Storage:
Lock on /grid/edits/journal/data/asprabhu-hadoop/in_use.
lock acquired by nodename
9...@asprabhu-hadoop-hdfs-jn-0.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin
.com
2024-07-04 09:15:44,735 INFO org.apache.hadoop.hdfs.qjournal.server.Journal:
Enabling the journaled edits cache with a capacity of
bytes: 1048576
```
Syncer was also able to fill the holes from other JNs:
```
2024-07-04 09:23:44,844 INFO
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Syncing Journal
/0.0.0.0:8485 with asprabhu-
hadoop-hdfs-jn-2.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin.com/100.98.113.190:8485,
journal id: a
sprabhu-hadoop
2024-07-04 09:23:44,867 INFO
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Downloading missing
Edit Log from http://asp/
rabhu-hadoop-hdfs-jn-2.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin.com:8480/getJournal?jid=asprabhu
-hadoop&segmentTxId=1&storageInfo=-63%3A613335273%3A1720082854169%3ACID-613cdcfb-9a6e-4b3b-a3f6-a33ddc7a6ca5&inProgressOk=false
to
/grid/edits/journal/data/asprabhu-hadoop
2024-07-04 09:23:44,929 INFO org.apache.hadoop.hdfs.server.common.Util:
Combined time for file download and fsync to all disks took
0.00s. The file download took 0.00s at 2000.00 KB/s. Synchronous (fsync)
write to disk of /grid/edits/journal/data/asprabhu-hadoop
/edits.sync/edits_0000000000000000001-0000000000000000042 took 0.00s.
2024-07-04 09:23:44,929 INFO
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Downloaded file
edits_0000000000000000001-00
00000000000000042 of size 2487 bytes
```
JN root dir after the testing:
```
[root@asprabhu-hadoop-hdfs-jn-0 current]# ls
VERSION
edits_0000000000000000057-0000000000000000058
committed-txid
edits_0000000000000000059-0000000000000000060
edits_0000000000000000001-0000000000000000042
edits_0000000000000000061-0000000000000000062
edits_0000000000000000043-0000000000000000046
edits_0000000000000000063-0000000000000000064
edits_0000000000000000047-0000000000000000048
edits_0000000000000000065-0000000000000000066
edits_0000000000000000049-0000000000000000050
edits_inprogress_0000000000000000067
edits_0000000000000000051-0000000000000000052 last-promised-epoch
edits_0000000000000000053-0000000000000000054 last-writer-epoch
edits_0000000000000000055-0000000000000000056 paxos
[root@asprabhu-hadoop-hdfs-jn-0 current]# pwd
/grid/edits/journal/data/asprabhu-hadoop/current
```
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> Automatically format new unformatted JournalNodes using JournalNodeSyncer
> --------------------------------------------------------------------------
>
> Key: HDFS-16690
> URL: https://issues.apache.org/jira/browse/HDFS-16690
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: journal-node
> Affects Versions: 3.4.0, 3.3.5
> Environment: Demonstrated in a Kubernetes environment running Java 11.
> # Start new cluster, but short 1 JN (minimum quorum, and the missing JN
> won’t resolve). VERIFY:
> - NN formats the 2 existing JN and stabilizes. NOTE: Formatting using just
> a quorum will be a separate submission
> - Messages show sync between JN-0 and JN-1, and NN -> JN.
> # Scale JN stateful set to add missing JN. VERIFY:
> - New JN starts
> - All other JN and all NN report IP address change (IP Address resolution).
> NOTE: require HADOOP-18365 and HDFS-16688
> - Messages show sync between all JN, and NN -> JN
> - New JN is formatted at least once (possibly by multiple other JN)
> - New JN storage directory is formatted only once
> - New JN joins cluster (lastWriterEpoch is non-zero)
> Reporter: Steve Vaughan
> Assignee: Steve Vaughan
> Priority: Major
>
> If an unformatted JournalNode is added to an existing JournalNode set,
> instances of the JournalNodeSyncer are unable to sync to the new node. When
> a sync receives a JournalNotFormattedException, we can initiate a format
> operation, and then retry the synchronization.
> Conceptually this means that the JournalNodes and their data can be managed
> independently from the rest of the system, as the JournalNodes will
> incorporate new JournalNode instances. Once the new JournalNode is
> formatted, it can participate in shared edits from the NameNodes.
> I've been testing an update to the InterQJournalProtocol to add a format call
> like that used by the NameNode. Current tests include starting an HA cluster
> from scratch, but with 2 JournalNode instances. Once the cluster is up, I
> can add the 3rd JournalNode (which is unformatted), and the other 2
> JournalNodes will eventually attempt to sync which results in a formatting
> and subsequent sync.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]