arp7 commented on a change in pull request #948: HDDS-1649. On installSnapshot notification from OM leader, download checkpoint and reload OM state URL: https://github.com/apache/hadoop/pull/948#discussion_r293606799
########## File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java ########## @@ -3122,6 +3136,131 @@ public boolean setAcl(OzoneObj obj, List<OzoneAcl> acls) throws IOException { } } + /** + * Download and install latest checkpoint from leader OM. + * If the download checkpoints snapshot index is greater than this OM's + * last applied transaction index, then re-initialize the OM state via this + * checkpoint. Before re-initializing OM state, the OM Ratis server should + * be stopped so that no new transactions can be applied. + * @param leaderId peerNodeID of the leader OM + * @return If checkpoint is installed, return the corresponding termIndex. + * Otherwise, return null. + */ + public TermIndex installSnapshot(String leaderId) { + if (omSnapshotProvider == null) { + LOG.error("OM Snapshot Provider is not configured as there are no peer " + + "nodes."); + return null; + } + + DBCheckpoint omDBcheckpoint; + try { + omDBcheckpoint = omSnapshotProvider.getOzoneManagerDBSnapshot(leaderId); + } catch (IOException e) { + LOG.error("Failed to download checkpoint from OM leader {}", leaderId, e); + return null; + } + + // Check if current ratis log index is smaller than the downloaded + // snapshot index. If yes, proceed by stopping the ratis server so that + // the OM state can be re-initialized. If no, then do not proceed with + // installSnapshot. + long lastAppliedIndex = omRatisServer.getStateMachineLastAppliedIndex(); + long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex(); + if (checkpointSnapshotIndex <= lastAppliedIndex) { + LOG.error("Failed to install checkpoint from OM leader: {}. The last " + Review comment: How do we recover from this situation eventually? Should we retry fetching a more recent snapshot. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org