[ 
https://issues.apache.org/jira/browse/HDFS-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863001#comment-17863001
 ] 

ASF GitHub Bot commented on HDFS-16690:
---------------------------------------

aswinmprabhu opened a new pull request, #6925:
URL: https://github.com/apache/hadoop/pull/6925

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   If an unformatted JournalNode is added to an existing JournalNode set --
   1. JournalNodeSyncer is unable to sync from the other JNs to this new JN
   2. Namenode is unable to flush edit logs to the the new JN
   
   This scenario can arise in different situations like 
   - OS upgrade maintenance of the JN host (with a data disk wipe) 
   - Moving the JN application to a new host due to h/w issues
   - Installing additional JNs (3 -> 5) for better HA during maintenance 
operations
   
   Manually fixing this involves rsyncing the VERSION file to the edit log root 
directory from the other healthy JNs. A similar issue concerning the paxos 
directory was solved in 
[HDFS-10659](https://issues.apache.org/jira/browse/HDFS-10659). 
   
   This PR tries to leverage the already existing JournalNodeSyncer daemon to 
format the JournalNode on which it is running when it discovers that syncs 
can't happen due to the `JournalNotFormattedException`. JournalNodeSyncer calls 
the `formatWithSyncer` method if it sees that the JN is unformatted. 
`formatWithSyncer` will loop over the other JN proxies, trying to fetch the 
`StorageInfo` object from them. The StorageInfo object is then used to format 
the JN by calling `JNStorage.format()`. 
   
   ### How was this patch tested?
   Unit tests will be added once I get some initial feedback. 
   
   **I've tested the changes manually in a K8S cluster with 3 JNs:**
   
   JN root dir before testing:
   ```
   [root@asprabhu-hadoop-hdfs-jn-0 current]# ls
   VERSION         edits_0000000000000000001-0000000000000000042  
last-promised-epoch  paxos
   committed-txid  edits_inprogress_0000000000000000043           
last-writer-epoch
   [root@asprabhu-hadoop-hdfs-jn-0 current]# pwd 
   /grid/edits/journal/data/asprabhu-hadoop/current
   ```
   Logs show that JN can receive edit logs: `2024-07-04 08:50:55,242 INFO 
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits 
file /grid/edits/journal/d
   ata/asprabhu-hadoop/current/edits_inprogress_0000000000000000001 -> 
/grid/edits/journal/data/asprabhu-hadoop/current/edits_00000000
   00000000001-0000000000000000042` 
   
   Killed the JN process and deleted the edit logs root dir:
   ```
   [root@asprabhu-hadoop-hdfs-jn-0 data]# kill -9 67
   [root@asprabhu-hadoop-hdfs-jn-0 data]# rm -rf *
   [root@asprabhu-hadoop-hdfs-jn-0 data]# ls
   [root@asprabhu-hadoop-hdfs-jn-0 data]# pwd
   /grid/edits/journal/data
   ```
   
   Started the JN process. The syncer formatted the JN. Some relevant log lines:
   ```
   2024-07-04 09:15:44,437 INFO 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Starting SyncJournal 
daemon for journal aspr
   abhu-hadoop
   2024-07-04 09:15:44,527 INFO 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Trying to format the 
journal with the syncer
   2024-07-04 09:15:44,639 ERROR 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Got invalid 
StorageInfo from 
asprabhu-hadoop-hdfs-jn-0.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin.com/100.104.107.156:8485
   2024-07-04 09:15:44,723 INFO 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Got StorageInfo 
lv=-63;cid=CID-613cdcfb-9a6e-4b3b-a3f6-a33ddc7a6ca5;nsid=613335273;c=1720082854169
 from 
asprabhu-hadoop-hdfs-jn-1.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin.com/100.98.128.215:8485
   2024-07-04 09:15:44,725 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: 
Formatting journal id : asprabhu-hadoop with namespace
    info: 
lv=-63;cid=CID-613cdcfb-9a6e-4b3b-a3f6-a33ddc7a6ca5;nsid=613335273;c=1720082854169;bpid=null
   2024-07-04 09:15:44,726 INFO org.apache.hadoop.hdfs.server.common.Storage: 
/grid/edits/journal/data/asprabhu-hadoop does not exist.
    Creating ...
           at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1241)
   2024-07-04 09:15:44,727 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Lock on /grid/edits/journal/data/asprabhu-hadoop/in_use.
   lock acquired by nodename 
9...@asprabhu-hadoop-hdfs-jn-0.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin
   .com
   2024-07-04 09:15:44,728 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Formatting journal Storage Directory /grid/edits/journal
   /data/asprabhu-hadoop with nsid: 613335273
   2024-07-04 09:15:44,735 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Creating paxos dir: /grid/edits/journal/data/asprabhu-ha
   doop/current/paxos
   2024-07-04 09:15:44,735 INFO org.apache.hadoop.hdfs.server.common.Storage: 
Lock on /grid/edits/journal/data/asprabhu-hadoop/in_use.
   lock acquired by nodename 
9...@asprabhu-hadoop-hdfs-jn-0.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin
   .com
   2024-07-04 09:15:44,735 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: 
Enabling the journaled edits cache with a capacity of 
   bytes: 1048576
   ```
   
   Syncer was also able to fill the holes from other JNs:
   ```
   2024-07-04 09:23:44,844 INFO 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Syncing Journal 
/0.0.0.0:8485 with asprabhu-
   
hadoop-hdfs-jn-2.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin.com/100.98.113.190:8485,
 journal id: a
   sprabhu-hadoop
   2024-07-04 09:23:44,867 INFO 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Downloading missing 
Edit Log from http://asp/
   
rabhu-hadoop-hdfs-jn-2.asprabhu-hadoop-hdfs-jn-svc.grid-integration-testing.svc.kube.grid.linkedin.com:8480/getJournal?jid=asprabhu
   
-hadoop&segmentTxId=1&storageInfo=-63%3A613335273%3A1720082854169%3ACID-613cdcfb-9a6e-4b3b-a3f6-a33ddc7a6ca5&inProgressOk=false
 to 
   /grid/edits/journal/data/asprabhu-hadoop
   2024-07-04 09:23:44,929 INFO org.apache.hadoop.hdfs.server.common.Util: 
Combined time for file download and fsync to all disks took
    0.00s. The file download took 0.00s at 2000.00 KB/s. Synchronous (fsync) 
write to disk of /grid/edits/journal/data/asprabhu-hadoop
   /edits.sync/edits_0000000000000000001-0000000000000000042 took 0.00s.
   2024-07-04 09:23:44,929 INFO 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Downloaded file 
edits_0000000000000000001-00
   00000000000000042 of size 2487 bytes
   ```
   
   JN root dir after the testing:
   ```
   [root@asprabhu-hadoop-hdfs-jn-0 current]# ls
   VERSION                                        
edits_0000000000000000057-0000000000000000058
   committed-txid                                 
edits_0000000000000000059-0000000000000000060
   edits_0000000000000000001-0000000000000000042  
edits_0000000000000000061-0000000000000000062
   edits_0000000000000000043-0000000000000000046  
edits_0000000000000000063-0000000000000000064
   edits_0000000000000000047-0000000000000000048  
edits_0000000000000000065-0000000000000000066
   edits_0000000000000000049-0000000000000000050  
edits_inprogress_0000000000000000067
   edits_0000000000000000051-0000000000000000052  last-promised-epoch
   edits_0000000000000000053-0000000000000000054  last-writer-epoch
   edits_0000000000000000055-0000000000000000056  paxos
   [root@asprabhu-hadoop-hdfs-jn-0 current]# pwd
   /grid/edits/journal/data/asprabhu-hadoop/current
   ```
   
   
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Automatically format new unformatted JournalNodes using JournalNodeSyncer 
> --------------------------------------------------------------------------
>
>                 Key: HDFS-16690
>                 URL: https://issues.apache.org/jira/browse/HDFS-16690
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: journal-node
>    Affects Versions: 3.4.0, 3.3.5
>         Environment: Demonstrated in a Kubernetes environment running Java 11.
>  # Start new cluster, but short 1 JN (minimum quorum, and the missing JN 
> won’t resolve). VERIFY:
>  - NN formats the 2 existing JN and stabilizes.  NOTE: Formatting using just 
> a quorum will be a separate submission
>  - Messages show sync between JN-0 and JN-1, and NN -> JN.
>  # Scale JN stateful set to add missing JN. VERIFY:
>  - New JN starts
>  - All other JN and all NN report IP address change (IP Address resolution).  
> NOTE: require HADOOP-18365 and HDFS-16688
>  - Messages show sync between all JN, and NN -> JN
>  - New JN is formatted at least once (possibly by multiple other JN)
>  - New JN storage directory is formatted only once
>  - New JN joins cluster (lastWriterEpoch is non-zero)
>            Reporter: Steve Vaughan
>            Assignee: Steve Vaughan
>            Priority: Major
>
> If an unformatted JournalNode is added to an existing JournalNode set, 
> instances of the JournalNodeSyncer are unable to sync to the new node.  When 
> a sync receives a JournalNotFormattedException, we can initiate a format 
> operation, and then retry the synchronization.
> Conceptually this means that the JournalNodes and their data can be managed 
> independently from the rest of the system, as the JournalNodes will 
> incorporate new JournalNode instances.  Once the new JournalNode is 
> formatted, it can participate in shared edits from the NameNodes. 
> I've been testing an update to the InterQJournalProtocol to add a format call 
> like that used by the NameNode.  Current tests include starting an HA cluster 
> from scratch, but with 2 JournalNode instances.  Once the cluster is up, I 
> can add the 3rd JournalNode (which is unformatted), and the other 2 
> JournalNodes will eventually attempt to sync which results in a formatting 
> and subsequent sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to