devmadhuu commented on code in PR #6640:
URL: https://github.com/apache/ozone/pull/6640#discussion_r1596857549


##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/OzoneManagerServiceProviderImpl.java:
##########
@@ -381,25 +397,36 @@ connectionFactory, getOzoneManagerSnapshotUrl(),
    */
   @VisibleForTesting
   boolean updateReconOmDBWithNewSnapshot() throws IOException {
+    // Check permissions of the Recon DB directory
+    checkAndValidateReconDbPermissions();
     // Obtain the current DB snapshot from OM and
     // update the in house OM metadata managed DB instance.
     long startTime = Time.monotonicNow();
     DBCheckpoint dbSnapshot = getOzoneManagerDBSnapshot();
     metrics.updateSnapshotRequestLatency(Time.monotonicNow() - startTime);
-    if (dbSnapshot != null && dbSnapshot.getCheckpointLocation() != null) {
-      LOG.info("Got new checkpoint from OM : " +
-          dbSnapshot.getCheckpointLocation());
-      try {
-        omMetadataManager.updateOmDB(
-            dbSnapshot.getCheckpointLocation().toFile());
-        return true;
-      } catch (IOException e) {
-        LOG.error("Unable to refresh Recon OM DB Snapshot. ", e);
-      }
-    } else {
-      LOG.error("Null snapshot location got from OM.");
+
+    if (dbSnapshot == null) {
+      LOG.error("Failed to obtain a valid DB snapshot from Ozone Manager. This 
could be due to " +
+          "missing SST files or other fetch issues.");
+      return false;
+    }
+
+    if (dbSnapshot.getCheckpointLocation() == null) {
+      LOG.error("Snapshot checkpoint location is null, indicating a failure to 
properly fetch or " +
+          "store the snapshot.");
+      return false;
+    }
+
+    LOG.info("Attempting to update Recon OM DB with new snapshot located at: 
{}",
+        dbSnapshot.getCheckpointLocation());
+    try {
+      
omMetadataManager.updateOmDB(dbSnapshot.getCheckpointLocation().toFile());
+      LOG.info("Successfully updated Recon OM DB with new snapshot.");

Review Comment:
   I would recommend this to change to debug level. because this will flood the 
logs of recon. And also log the sequence number being used to fetch the OM DB 
snapshot. 
   And check if this task details being updated in ReconTask table as suggested 
above with sequence number and list of SST files.



##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/OzoneManagerServiceProviderImpl.java:
##########
@@ -366,6 +374,14 @@ connectionFactory, getOzoneManagerSnapshotUrl(),
       reconUtils.untarCheckpointFile(targetFile, untarredDbDir);
       FileUtils.deleteQuietly(targetFile);
 
+      // Validate the presence of required SST files
+      File[] sstFiles = untarredDbDir.toFile().listFiles((dir, name) -> 
name.endsWith(".sst"));
+      if (sstFiles == null || sstFiles.length == 0) {
+        LOG.warn("No SST files found in the OM snapshot directory: {}", 
untarredDbDir);
+        return null;
+      }
+      LOG.info("Valid OM snapshot with SST files found at: {}", untarredDbDir);

Review Comment:
   In addition to debug level, I would request to log the list of valid SST 
files, so in case of issues where compaction of SST files happens at OM, SST 
files fetched may not be exactly matching at Recon side, or another scenario 
where due to some unexpected issue at OM, SST files being fetched but not all 
being fetched since the last sequence number, then printing the list of SST 
files along with corresponding sequence number as debug log would be helpful to 
debug.
   
   1. Add the last sequence number somewhere in DB under ReconTask table with 
the corresponding task details and list of SST files, if not done yet. This can 
be added later in troubleshooting doc in case of OM DB display data and 
customer can get the details from table.
   2. Same thing need to do for other tasks like NSSummary tasks which may 
impact DU and customer should be able to troubleshoot on its own.



##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/OzoneManagerServiceProviderImpl.java:
##########
@@ -567,6 +594,41 @@ public boolean syncDataFromOM() {
     return true;
   }
 
+  private void checkAndValidateReconDbPermissions() {
+    File dbDir = new File(omSnapshotDBParentDir.getPath());
+    if (!dbDir.exists()) {
+      LOG.error("Recon DB directory does not exist: {}", 
dbDir.getAbsolutePath());
+      return;
+    }
+
+    try {
+      // Fetch expected permissions from configuration
+      String expectedPermissions =
+          ServerUtils.getPermissions(ReconConfigKeys.OZONE_RECON_DB_DIR, 
configuration);
+      String expectedPermissionsSymbolic =
+          ReconUtils.convertNumericToSymbolic(expectedPermissions);
+
+      // Get actual permissions
+      Set<PosixFilePermission> actualPermissions =
+          Files.getPosixFilePermissions(dbDir.toPath());
+      String actualPermissionsStr =
+          PosixFilePermissions.toString(actualPermissions);
+
+      if (!expectedPermissionsSymbolic.equals(actualPermissionsStr)) {
+        LOG.warn("Permissions for Recon DB directory '{}' are set to '{}', but 
expected '{}'",
+            dbDir.getAbsolutePath(), actualPermissionsStr, 
expectedPermissionsSymbolic);
+      } else {
+        LOG.info("Permissions for Recon DB directory '{}' are correctly set to 
'{}'",

Review Comment:
   This log as well will flood logs of recon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to