Copilot commented on code in PR #9739:
URL: https://github.com/apache/ozone/pull/9739#discussion_r2785460958


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/AbstractOzoneFileSystemTestWithFSO.java:
##########
@@ -552,4 +553,46 @@ long verifyDirKey(long volumeId, long bucketId, long 
parentId,
     return dirInfo.getObjectID();
   }
 
+  /**
+   * Test to reproduce "Directory Not Empty" bug using public FileSystem API.
+   * Tests both checkSubDirectoryExists() and checkSubFileExists() paths.
+   * Creates child directory and file, deletes them, then tries to delete 
parent.
+   * This test exposes a Ratis transaction visibility issue where deleted
+   * entries are in cache but not yet flushed to DB via double buffer.
+   */
+  @Test
+  public void testDeleteParentAfterChildDeleted() throws Exception {
+    Path parent = new Path("/parent");
+    Path childDir = new Path(parent, "childDir");
+    Path childFile = new Path(parent, "childFile");
+
+    // Create parent directory
+    assertTrue(getFs().mkdirs(parent));
+    // Create child directory (tests checkSubDirectoryExists path)
+    assertTrue(getFs().mkdirs(childDir));
+    // Create child file (tests checkSubFileExists path)
+    ContractTestUtils.touch(getFs(), childFile);
+
+    // Stop double buffer to prevent flushing deleted entries to DB
+    // This makes the bug reproduce more reliably
+    OzoneManagerDoubleBuffer doubleBuffer = getCluster().getOzoneManager()
+        .getOmRatisServer().getOmStateMachine().getOzoneManagerDoubleBuffer();
+    doubleBuffer.stopDaemon();
+

Review Comment:
   `stopDaemon()` fully stops the OM double buffer flush thread (it interrupts 
+ joins the daemon). Calling `resume()` afterwards only flips the `isRunning` 
flag and does not restart the thread, so subsequent operations/tests in this 
PER_CLASS test suite may never flush to RocksDB. Use `pause()/unpause()` 
instead (or avoid stopping the daemon thread entirely) to block flushing during 
the test without killing the thread.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java:
##########
@@ -911,11 +930,15 @@ private static boolean checkSubFileExists(OmKeyInfo 
omKeyInfo,
     Iterator<Map.Entry<CacheKey<String>, CacheValue<OmKeyInfo>>>
             cacheIter = fileTable.cacheIterator();
 
+    // Track deleted entries in cache (null values = pending Ratis delete)
+    Set<String> deletedKeysInCache = new HashSet<>();
     while (cacheIter.hasNext()) {
       Map.Entry<CacheKey<String>, CacheValue<OmKeyInfo>> entry =
               cacheIter.next();
       OmKeyInfo cacheOmFileInfo = entry.getValue().getCacheValue();
       if (cacheOmFileInfo == null) {
+        // Entry marked for deletion in cache (Ratis transaction committed but 
not yet flushed to DB)
+        deletedKeysInCache.add(entry.getKey().getCacheKey());
         continue;

Review Comment:
   Same as `checkSubDirectoryExists`: building a `Set` of all tombstoned cache 
keys scales with the full cache size, while the DB loop only needs to know if 
the current `dbKey` is tombstoned. Prefer an on-demand 
`fileTable.getCacheValue(new CacheKey<>(dbKey))` check (as done in 
`OmMetadataManagerImpl#isKeyPresentInTable`) or restrict tracking to the 
relevant prefix to keep `hasChildren()` efficient.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java:
##########
@@ -866,11 +868,15 @@ private static boolean checkSubDirectoryExists(OmKeyInfo 
omKeyInfo,
     Iterator<Map.Entry<CacheKey<String>, CacheValue<OmDirectoryInfo>>>
             cacheIter = dirTable.cacheIterator();
 
+    // Track deleted entries in cache (null values = pending Ratis delete)
+    Set<String> deletedKeysInCache = new HashSet<>();
     while (cacheIter.hasNext()) {
       Map.Entry<CacheKey<String>, CacheValue<OmDirectoryInfo>> entry =
               cacheIter.next();
       OmDirectoryInfo cacheOmDirInfo = entry.getValue().getCacheValue();
       if (cacheOmDirInfo == null) {
+        // Entry marked for deletion in cache (Ratis transaction committed but 
not yet flushed to DB)
+        deletedKeysInCache.add(entry.getKey().getCacheKey());

Review Comment:
   `deletedKeysInCache` is populated by iterating the entire table cache and 
storing every tombstoned key, even though the DB scan only needs to know 
whether the current `dbKey` is tombstoned. This adds avoidable CPU/memory 
overhead on the delete path. Consider checking tombstones on-demand via 
`dirTable.getCacheValue(new CacheKey<>(dbKey))` (pattern used in 
`OmMetadataManagerImpl#isKeyPresentInTable`) or at least limiting tracking to 
keys under the `seekDirInDB` prefix.



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/request/key/TestOMKeyDeleteRequestWithFSO.java:
##########
@@ -259,4 +262,75 @@ public void testDeleteDirectoryWithColonInFSOBucket() 
throws Exception {
     assertEquals(OzoneManagerProtocolProtos.Status.OK, 
response.getOMResponse().getStatus());
     assertNull(omMetadataManager.getDirectoryTable().get(dirName));
   }
+
+  private OMRequest createDeleteKeyRequest(String keyPath, boolean recursive) {
+    KeyArgs keyArgs = KeyArgs.newBuilder()
+        .setBucketName(bucketName)
+        .setVolumeName(volumeName)
+        .setKeyName(keyPath)
+        .setRecursive(recursive)
+        .build();
+
+    DeleteKeyRequest deleteKeyRequest =
+        DeleteKeyRequest.newBuilder().setKeyArgs(keyArgs).build();
+
+    return OMRequest.newBuilder()
+        .setDeleteKeyRequest(deleteKeyRequest)
+        .setCmdType(OzoneManagerProtocolProtos.Type.DeleteKey)
+        .setClientId(UUID.randomUUID().toString())
+        .build();
+  }
+
+  /**
+   * Minimal test to reproduce "Directory Not Empty" bug with Ratis.
+   * Tests both checkSubDirectoryExists() and checkSubFileExists() paths.
+   * Creates child directory and file, deletes them, then tries to delete 
parent.
+   * This test exposes a Ratis transaction visibility issue where deleted
+   * entries are in cache but not yet flushed to DB via double buffer.
+   */
+  @Test
+  public void testDeleteParentAfterChildDeleted() throws Exception {
+    OMRequestTestUtils.addVolumeAndBucketToDB(volumeName, bucketName, 
omMetadataManager, getBucketLayout());
+
+    String parentDir = "parent";
+    long parentId = OMRequestTestUtils.addParentsToDirTable(volumeName, 
bucketName, parentDir, omMetadataManager);
+
+    // Create a child directory (tests checkSubDirectoryExists path)
+    OMRequestTestUtils.addParentsToDirTable(volumeName, bucketName, parentDir 
+ "/childDir", omMetadataManager);
+    

Review Comment:
   There is trailing whitespace on this otherwise blank line. Please remove it 
to avoid checkstyle/formatting failures and reduce noisy diffs.
   ```suggestion
   
   ```



##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/AbstractOzoneFileSystemTestWithFSO.java:
##########
@@ -552,4 +553,46 @@ long verifyDirKey(long volumeId, long bucketId, long 
parentId,
     return dirInfo.getObjectID();
   }
 
+  /**
+   * Test to reproduce "Directory Not Empty" bug using public FileSystem API.
+   * Tests both checkSubDirectoryExists() and checkSubFileExists() paths.
+   * Creates child directory and file, deletes them, then tries to delete 
parent.
+   * This test exposes a Ratis transaction visibility issue where deleted
+   * entries are in cache but not yet flushed to DB via double buffer.
+   */
+  @Test
+  public void testDeleteParentAfterChildDeleted() throws Exception {
+    Path parent = new Path("/parent");
+    Path childDir = new Path(parent, "childDir");
+    Path childFile = new Path(parent, "childFile");
+
+    // Create parent directory
+    assertTrue(getFs().mkdirs(parent));
+    // Create child directory (tests checkSubDirectoryExists path)
+    assertTrue(getFs().mkdirs(childDir));
+    // Create child file (tests checkSubFileExists path)
+    ContractTestUtils.touch(getFs(), childFile);
+
+    // Stop double buffer to prevent flushing deleted entries to DB
+    // This makes the bug reproduce more reliably
+    OzoneManagerDoubleBuffer doubleBuffer = getCluster().getOzoneManager()
+        .getOmRatisServer().getOmStateMachine().getOzoneManagerDoubleBuffer();
+    doubleBuffer.stopDaemon();
+
+    try {
+      // Delete child directory
+      assertTrue(getFs().delete(childDir, false), "Child directory delete 
should succeed");
+      // Delete child file
+      assertTrue(getFs().delete(childFile, false), "Child file delete should 
succeed");
+
+      // Try to delete parent directory (should succeed but may fail with the 
bug)
+      // Without the fix, this fails because deleted children are still in DB
+      boolean parentDeleted = getFs().delete(parent, false);
+      assertTrue(parentDeleted, "Parent delete should succeed after children 
deleted");
+    } finally {
+      // Resume double buffer to avoid affecting other tests
+      doubleBuffer.resume();
+    }

Review Comment:
   The `finally` block calls `doubleBuffer.resume()`, but `resume()` does not 
restart the flush thread after `stopDaemon()`; it only sets `isRunning=true`. 
This leaves the double buffer in an inconsistent state for the rest of the test 
class. If you switch to `pause()`, this should be `unpause()` (or otherwise 
ensure the daemon thread is actually running again).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to