Copilot commented on code in PR #9967:
URL: https://github.com/apache/gravitino/pull/9967#discussion_r2793086349


##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
     try {
       Table table = loadTable(ident);
       String location = table.properties().get(Table.PROPERTY_LOCATION);
+      boolean external =
+          Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+              .map(Boolean::parseBoolean)
+              .orElse(false);
 
       boolean purged = super.purgeTable(ident);
+      // If the table is a managed table, super.purgeTable will call dropTable 
to remove the
+      // underlying Lance dataset, so we don't need to do anything here.
+      if (!external) {
+        return purged;
+      }
+
+      Map<String, String> lanceStorageOptions =
+          LancePropertiesUtils.getLanceStorageOptions(table.properties());
       // If the table metadata is purged successfully, we can delete the Lance 
dataset.
       // Otherwise, we should not delete the dataset.
       if (purged) {
-        // Delete the Lance dataset at the location
-        Dataset.drop(location, 
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+        // Delete the Lance dataset at the location, first we try to detect 
whether the dataset
+        // still exists before the purge operation. If it does not exist, then 
it means there is
+        // not need to call DataSet#drop to delete the dataset. If it still 
exists, then we proceed

Review Comment:
   Inconsistent capitalization in comment: 'DataSet#drop' should be 
'Dataset#drop' to match the actual class name (com.lancedb.lance.Dataset).
   ```suggestion
           // not need to call Dataset#drop to delete the dataset. If it still 
exists, then we proceed
   ```



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
     try {
       Table table = loadTable(ident);
       String location = table.properties().get(Table.PROPERTY_LOCATION);
+      boolean external =
+          Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+              .map(Boolean::parseBoolean)
+              .orElse(false);
 
       boolean purged = super.purgeTable(ident);
+      // If the table is a managed table, super.purgeTable will call dropTable 
to remove the
+      // underlying Lance dataset, so we don't need to do anything here.
+      if (!external) {
+        return purged;
+      }
+
+      Map<String, String> lanceStorageOptions =
+          LancePropertiesUtils.getLanceStorageOptions(table.properties());
       // If the table metadata is purged successfully, we can delete the Lance 
dataset.
       // Otherwise, we should not delete the dataset.
       if (purged) {
-        // Delete the Lance dataset at the location
-        Dataset.drop(location, 
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+        // Delete the Lance dataset at the location, first we try to detect 
whether the dataset
+        // still exists before the purge operation. If it does not exist, then 
it means there is
+        // not need to call DataSet#drop to delete the dataset. If it still 
exists, then we proceed

Review Comment:
   Typo in comment: 'there is not need' should be 'there is no need'.
   ```suggestion
           // no need to call DataSet#drop to delete the dataset. If it still 
exists, then we proceed
   ```



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
     try {
       Table table = loadTable(ident);
       String location = table.properties().get(Table.PROPERTY_LOCATION);
+      boolean external =
+          Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+              .map(Boolean::parseBoolean)
+              .orElse(false);
 
       boolean purged = super.purgeTable(ident);
+      // If the table is a managed table, super.purgeTable will call dropTable 
to remove the
+      // underlying Lance dataset, so we don't need to do anything here.
+      if (!external) {
+        return purged;
+      }
+
+      Map<String, String> lanceStorageOptions =
+          LancePropertiesUtils.getLanceStorageOptions(table.properties());
       // If the table metadata is purged successfully, we can delete the Lance 
dataset.
       // Otherwise, we should not delete the dataset.
       if (purged) {
-        // Delete the Lance dataset at the location
-        Dataset.drop(location, 
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+        // Delete the Lance dataset at the location, first we try to detect 
whether the dataset
+        // still exists before the purge operation. If it does not exist, then 
it means there is
+        // not need to call DataSet#drop to delete the dataset. If it still 
exists, then we proceed
+        // to delete the dataset.
+        ReadOptions readOptions =
+            new 
ReadOptions.Builder().setStorageOptions(lanceStorageOptions).build();
+        try (Dataset ignored = Dataset.open(new RootAllocator(), location, 
readOptions)) {

Review Comment:
   Resource leak: The RootAllocator created here is never closed. RootAllocator 
should be managed with try-with-resources or explicitly closed to prevent 
memory leaks. The allocator is created solely for the Dataset.open call and 
should be closed immediately after the try block completes or fails.
   ```suggestion
           try (RootAllocator allocator = new RootAllocator();
               Dataset ignored = Dataset.open(allocator, location, 
readOptions)) {
   ```



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
     try {
       Table table = loadTable(ident);
       String location = table.properties().get(Table.PROPERTY_LOCATION);
+      boolean external =
+          Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+              .map(Boolean::parseBoolean)
+              .orElse(false);
 
       boolean purged = super.purgeTable(ident);
+      // If the table is a managed table, super.purgeTable will call dropTable 
to remove the
+      // underlying Lance dataset, so we don't need to do anything here.
+      if (!external) {
+        return purged;
+      }
+
+      Map<String, String> lanceStorageOptions =
+          LancePropertiesUtils.getLanceStorageOptions(table.properties());
       // If the table metadata is purged successfully, we can delete the Lance 
dataset.
       // Otherwise, we should not delete the dataset.
       if (purged) {
-        // Delete the Lance dataset at the location
-        Dataset.drop(location, 
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+        // Delete the Lance dataset at the location, first we try to detect 
whether the dataset
+        // still exists before the purge operation. If it does not exist, then 
it means there is
+        // not need to call DataSet#drop to delete the dataset. If it still 
exists, then we proceed
+        // to delete the dataset.

Review Comment:
   Misleading comment: The comment states 'first we try to detect whether the 
dataset still exists before the purge operation', but this check happens AFTER 
super.purgeTable() has already executed (line 199). The comment should clarify 
that this checks whether the dataset exists before attempting to delete it with 
Dataset.drop, not before the purge operation itself.
   ```suggestion
           // Before deleting the Lance dataset at the location with 
Dataset.drop, first try to
           // detect whether the dataset still exists. If it does not exist, 
then there is no need
           // to call Dataset.drop to delete the dataset. If it still exists, 
then proceed to delete
           // the dataset.
   ```



##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/integration/test/CatalogGenericCatalogLanceIT.java:
##########
@@ -153,6 +153,84 @@ public void resetSchema() throws InterruptedException {
     createSchema();
   }
 
+  @Test
+  public void testPurgeTableWhenLocationMissing() throws IOException {
+    Column[] columns = createColumns();
+    String location = tempDirectory + "/" + 
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+    NameIdentifier identifier =
+        NameIdentifier.of(schemaName, 
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+    Map<String, String> properties = createProperties();
+    properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+    properties.put(Table.PROPERTY_EXTERNAL, "true");
+    properties.put(Table.PROPERTY_LOCATION, location);
+
+    catalog
+        .asTableCatalog()
+        .createTable(
+            identifier, columns, TABLE_COMMENT, properties, 
Transforms.EMPTY_TRANSFORM, null, null);
+    Assertions.assertTrue(new File(location).exists());
+
+    // Simulate missing data before purge
+    FileUtils.deleteDirectory(new File(location));
+    Assertions.assertFalse(new File(location).exists());
+
+    Assertions.assertDoesNotThrow(() -> 
catalog.asTableCatalog().purgeTable(identifier));
+    Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+  }
+
+  @Test
+  public void testPurgeTableWhenLocationNeverCreated() {
+    Column[] columns = createColumns();
+    String location = tempDirectory + "/" + 
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+    NameIdentifier identifier =
+        NameIdentifier.of(schemaName, 
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+    Map<String, String> properties = createProperties();
+    properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+    properties.put(Table.PROPERTY_EXTERNAL, "true");
+    properties.put(Table.PROPERTY_LOCATION, location);
+
+    catalog
+        .asTableCatalog()
+        .createTable(
+            identifier, columns, TABLE_COMMENT, properties, 
Transforms.EMPTY_TRANSFORM, null, null);
+    // Normal case, the location should be created
+    Assertions.assertTrue(new File(location).exists());
+
+    Assertions.assertDoesNotThrow(() -> 
catalog.asTableCatalog().purgeTable(identifier));
+    Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+  }
+
+  @Test
+  public void testPurgeManagedTableWithAutoLocation() {
+    Column[] columns = createColumns();
+    NameIdentifier identifier =
+        NameIdentifier.of(schemaName, 
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+    String location = tempDirectory + "/" + 
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+
+    Map<String, String> properties = createProperties();
+    properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+    properties.put(Table.PROPERTY_EXTERNAL, "false");
+    properties.put(Table.PROPERTY_LOCATION, location);
+
+    Table table =
+        catalog
+            .asTableCatalog()
+            .createTable(
+                identifier,
+                columns,
+                TABLE_COMMENT,
+                properties,
+                Transforms.EMPTY_TRANSFORM,
+                null,
+                null);
+    String resolvedLocation = table.properties().get(Table.PROPERTY_LOCATION);
+    Assertions.assertNotNull(resolvedLocation);
+    Assertions.assertTrue(new File(resolvedLocation).exists());
+
+    Assertions.assertDoesNotThrow(() -> 
catalog.asTableCatalog().purgeTable(identifier));
+    Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+  }

Review Comment:
   Missing test scenario: This test should also verify the case where the 
managed table's dataset location is deleted (or was never created via 
createEmptyTable) before calling purgeTable. Currently, it only tests the happy 
path. Add a test case that deletes the location directory before calling 
purgeTable to ensure no exception is thrown.



##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/integration/test/CatalogGenericCatalogLanceIT.java:
##########
@@ -153,6 +153,84 @@ public void resetSchema() throws InterruptedException {
     createSchema();
   }
 
+  @Test
+  public void testPurgeTableWhenLocationMissing() throws IOException {
+    Column[] columns = createColumns();
+    String location = tempDirectory + "/" + 
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+    NameIdentifier identifier =
+        NameIdentifier.of(schemaName, 
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+    Map<String, String> properties = createProperties();
+    properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+    properties.put(Table.PROPERTY_EXTERNAL, "true");
+    properties.put(Table.PROPERTY_LOCATION, location);
+
+    catalog
+        .asTableCatalog()
+        .createTable(
+            identifier, columns, TABLE_COMMENT, properties, 
Transforms.EMPTY_TRANSFORM, null, null);
+    Assertions.assertTrue(new File(location).exists());
+
+    // Simulate missing data before purge
+    FileUtils.deleteDirectory(new File(location));
+    Assertions.assertFalse(new File(location).exists());
+
+    Assertions.assertDoesNotThrow(() -> 
catalog.asTableCatalog().purgeTable(identifier));
+    Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+  }
+
+  @Test
+  public void testPurgeTableWhenLocationNeverCreated() {
+    Column[] columns = createColumns();
+    String location = tempDirectory + "/" + 
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+    NameIdentifier identifier =
+        NameIdentifier.of(schemaName, 
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+    Map<String, String> properties = createProperties();
+    properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+    properties.put(Table.PROPERTY_EXTERNAL, "true");
+    properties.put(Table.PROPERTY_LOCATION, location);
+
+    catalog
+        .asTableCatalog()
+        .createTable(
+            identifier, columns, TABLE_COMMENT, properties, 
Transforms.EMPTY_TRANSFORM, null, null);
+    // Normal case, the location should be created
+    Assertions.assertTrue(new File(location).exists());
+
+    Assertions.assertDoesNotThrow(() -> 
catalog.asTableCatalog().purgeTable(identifier));
+    Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+  }
+
+  @Test
+  public void testPurgeManagedTableWithAutoLocation() {
+    Column[] columns = createColumns();
+    NameIdentifier identifier =
+        NameIdentifier.of(schemaName, 
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+    String location = tempDirectory + "/" + 
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+
+    Map<String, String> properties = createProperties();
+    properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+    properties.put(Table.PROPERTY_EXTERNAL, "false");
+    properties.put(Table.PROPERTY_LOCATION, location);
+
+    Table table =
+        catalog
+            .asTableCatalog()
+            .createTable(
+                identifier,
+                columns,
+                TABLE_COMMENT,
+                properties,
+                Transforms.EMPTY_TRANSFORM,
+                null,
+                null);
+    String resolvedLocation = table.properties().get(Table.PROPERTY_LOCATION);
+    Assertions.assertNotNull(resolvedLocation);
+    Assertions.assertTrue(new File(resolvedLocation).exists());
+
+    Assertions.assertDoesNotThrow(() -> 
catalog.asTableCatalog().purgeTable(identifier));
+    Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+  }

Review Comment:
   Incomplete test coverage: This test should also verify that the dataset 
location is actually deleted after purging. Currently, it only checks that the 
table metadata is removed (tableExists returns false), but doesn't verify that 
the physical dataset directory at 'resolvedLocation' was cleaned up.



##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/integration/test/CatalogGenericCatalogLanceIT.java:
##########
@@ -153,6 +153,84 @@ public void resetSchema() throws InterruptedException {
     createSchema();
   }
 
+  @Test
+  public void testPurgeTableWhenLocationMissing() throws IOException {
+    Column[] columns = createColumns();
+    String location = tempDirectory + "/" + 
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+    NameIdentifier identifier =
+        NameIdentifier.of(schemaName, 
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+    Map<String, String> properties = createProperties();
+    properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+    properties.put(Table.PROPERTY_EXTERNAL, "true");
+    properties.put(Table.PROPERTY_LOCATION, location);
+
+    catalog
+        .asTableCatalog()
+        .createTable(
+            identifier, columns, TABLE_COMMENT, properties, 
Transforms.EMPTY_TRANSFORM, null, null);
+    Assertions.assertTrue(new File(location).exists());
+
+    // Simulate missing data before purge
+    FileUtils.deleteDirectory(new File(location));
+    Assertions.assertFalse(new File(location).exists());
+
+    Assertions.assertDoesNotThrow(() -> 
catalog.asTableCatalog().purgeTable(identifier));
+    Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+  }
+
+  @Test
+  public void testPurgeTableWhenLocationNeverCreated() {
+    Column[] columns = createColumns();
+    String location = tempDirectory + "/" + 
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+    NameIdentifier identifier =
+        NameIdentifier.of(schemaName, 
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+    Map<String, String> properties = createProperties();
+    properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+    properties.put(Table.PROPERTY_EXTERNAL, "true");
+    properties.put(Table.PROPERTY_LOCATION, location);
+
+    catalog
+        .asTableCatalog()
+        .createTable(
+            identifier, columns, TABLE_COMMENT, properties, 
Transforms.EMPTY_TRANSFORM, null, null);
+    // Normal case, the location should be created
+    Assertions.assertTrue(new File(location).exists());
+
+    Assertions.assertDoesNotThrow(() -> 
catalog.asTableCatalog().purgeTable(identifier));
+    Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+  }

Review Comment:
   Misleading test name: The test is named 
'testPurgeTableWhenLocationNeverCreated' but the test actually creates the 
location at line 197 (Assertions.assertTrue(new File(location).exists())). This 
test appears to be testing the normal case where the location exists and should 
be purged successfully. Consider renaming to something like 
'testPurgeExternalTableWithExistingLocation' or removing this test if it 
duplicates other coverage.



##########
lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java:
##########
@@ -899,7 +899,7 @@ void testDropTable() {
     // Drop the table
     DropTableRequest dropTableRequest = new DropTableRequest();
     dropTableRequest.setId(ids);

Review Comment:
   Test change may hide bugs: The original test expected dropTable to throw an 
exception, but now it expects no exception. However, the test doesn't verify 
WHY it was expected to throw before. If the original test was checking that 
dropping an empty table (created via createEmptyTable) throws an exception 
because the dataset was never created, then this change just masks the issue 
without documenting the behavioral change. Consider adding a comment explaining 
why the behavior changed from throwing to not throwing.
   ```suggestion
       dropTableRequest.setId(ids);
       // Historically, dropping a table created via createEmptyTable was 
expected to throw
       // if the underlying dataset had not been created yet. The Lance REST 
API was updated
       // to allow dropping such empty tables without error, so this test now 
asserts that
       // dropTable does not throw here and relies on the following checks 
(describe/drop
       // again returning 404) to verify that the table is actually removed.
   ```



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
     try {
       Table table = loadTable(ident);
       String location = table.properties().get(Table.PROPERTY_LOCATION);
+      boolean external =
+          Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+              .map(Boolean::parseBoolean)
+              .orElse(false);
 
       boolean purged = super.purgeTable(ident);
+      // If the table is a managed table, super.purgeTable will call dropTable 
to remove the
+      // underlying Lance dataset, so we don't need to do anything here.
+      if (!external) {
+        return purged;
+      }
+
+      Map<String, String> lanceStorageOptions =
+          LancePropertiesUtils.getLanceStorageOptions(table.properties());
       // If the table metadata is purged successfully, we can delete the Lance 
dataset.
       // Otherwise, we should not delete the dataset.
       if (purged) {
-        // Delete the Lance dataset at the location
-        Dataset.drop(location, 
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+        // Delete the Lance dataset at the location, first we try to detect 
whether the dataset
+        // still exists before the purge operation. If it does not exist, then 
it means there is
+        // not need to call DataSet#drop to delete the dataset. If it still 
exists, then we proceed
+        // to delete the dataset.
+        ReadOptions readOptions =
+            new 
ReadOptions.Builder().setStorageOptions(lanceStorageOptions).build();
+        try (Dataset ignored = Dataset.open(new RootAllocator(), location, 
readOptions)) {
+        } catch (Exception e) {
+          LOG.warn(
+              "Failed to open Lance dataset at location {} with options {}, it 
may have already been deleted. Skipping dataset deletion.",
+              location,
+              readOptions,
+              e);
+          return purged;

Review Comment:
   This catch-all Exception handler may silently ignore legitimate errors that 
should be propagated. The intent is to detect missing datasets, but this will 
also catch and suppress other exceptions like permission errors, I/O errors, or 
programming errors. Consider catching more specific exceptions (e.g., 
FileNotFoundException or whatever specific exception Lance throws for missing 
datasets) and re-throwing unexpected exceptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to