[jira] [Work logged] (HDDS-1649) On installSnapshot notification from OM leader, download checkpoint and reload OM state

ASF GitHub Bot (JIRA) Thu, 13 Jun 2019 16:01:21 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-1649?focusedWorklogId=259996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-259996
 ]


ASF GitHub Bot logged work on HDDS-1649:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jun/19 23:00
            Start Date: 13/Jun/19 23:00
    Worklog Time Spent: 10m 
      Work Description: arp7 commented on pull request #948: HDDS-1649. On 
installSnapshot notification from OM leader, download checkpoint and reload OM 
state
URL: https://github.com/apache/hadoop/pull/948#discussion_r293607289
 
 

 ##########
 File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
 ##########
 @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List<OzoneAcl> 
acls) throws IOException {
     }
   }
 
+  /**
+   * Download and install latest checkpoint from leader OM.
+   * If the download checkpoints snapshot index is greater than this OM's
+   * last applied transaction index, then re-initialize the OM state via this
+   * checkpoint. Before re-initializing OM state, the OM Ratis server should
+   * be stopped so that no new transactions can be applied.
+   * @param leaderId peerNodeID of the leader OM
+   * @return If checkpoint is installed, return the corresponding termIndex.
+   * Otherwise, return null.
+   */
+  public TermIndex installSnapshot(String leaderId) {
+    if (omSnapshotProvider == null) {
+      LOG.error("OM Snapshot Provider is not configured as there are no peer " 
+
+          "nodes.");
+      return null;
+    }
+
+    DBCheckpoint omDBcheckpoint;
+    try {
+      omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId);
+    } catch (IOException e) {
+      LOG.error("Failed to download checkpoint from OM leader {}", leaderId, 
e);
+      return null;
+    }
+
+    // Check if current ratis log index is smaller than the downloaded
+    // snapshot index. If yes, proceed by stopping the ratis server so that
+    // the OM state can be re-initialized. If no, then do not proceed with
+    // installSnapshot.
+    long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex();
+    long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
+    if (checkpointSnapshotIndex <= lastAppliedIndex) {
+      LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
+          "applied index: {} is greater than or equal to the checkpoint's " +
+          "snapshot index: {}", leaderId, lastAppliedIndex,
+          checkpointSnapshotIndex);
+      return null;
+    }
+
+    // Stop the ratis server so that no new transactions are applied. This
+    // can happen if a leader election happens while the state is being
+    // re-initialized.
+    omRatisServer.stop();
+
+    // Clear the OM Double Buffer so that if there are any pending
+    // transactions in the buffer, they are discarded.
+    omDoubleBuffer.stop();
+
+    // Take a backup of the current DB
+    File dbFile = metadataManager.getStore().getDbLocation();
 
 Review comment:
   This is going to be a directory correct? Since I assume a DB is multiple 
files.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 259996)

> On installSnapshot notification from OM leader, download checkpoint and 
> reload OM state
> ---------------------------------------------------------------------------------------
>
>                 Key: HDDS-1649
>                 URL: https://issues.apache.org/jira/browse/HDDS-1649
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>            Reporter: Hanisha Koneru
>            Assignee: Hanisha Koneru
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When an OM follower receives installSnapshot notification from OM leader, it 
> should initiate a new checkpoint on the OM leader and download that 
> checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDDS-1649) On installSnapshot notification from OM leader, download checkpoint and reload OM state

Reply via email to