Copilot commented on code in PR #9967:
URL: https://github.com/apache/gravitino/pull/9967#discussion_r2793086349
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
try {
Table table = loadTable(ident);
String location = table.properties().get(Table.PROPERTY_LOCATION);
+ boolean external =
+ Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+ .map(Boolean::parseBoolean)
+ .orElse(false);
boolean purged = super.purgeTable(ident);
+ // If the table is a managed table, super.purgeTable will call dropTable
to remove the
+ // underlying Lance dataset, so we don't need to do anything here.
+ if (!external) {
+ return purged;
+ }
+
+ Map<String, String> lanceStorageOptions =
+ LancePropertiesUtils.getLanceStorageOptions(table.properties());
// If the table metadata is purged successfully, we can delete the Lance
dataset.
// Otherwise, we should not delete the dataset.
if (purged) {
- // Delete the Lance dataset at the location
- Dataset.drop(location,
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+ // Delete the Lance dataset at the location, first we try to detect
whether the dataset
+ // still exists before the purge operation. If it does not exist, then
it means there is
+ // not need to call DataSet#drop to delete the dataset. If it still
exists, then we proceed
Review Comment:
Inconsistent capitalization in comment: 'DataSet#drop' should be
'Dataset#drop' to match the actual class name (com.lancedb.lance.Dataset).
```suggestion
// not need to call Dataset#drop to delete the dataset. If it still
exists, then we proceed
```
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
try {
Table table = loadTable(ident);
String location = table.properties().get(Table.PROPERTY_LOCATION);
+ boolean external =
+ Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+ .map(Boolean::parseBoolean)
+ .orElse(false);
boolean purged = super.purgeTable(ident);
+ // If the table is a managed table, super.purgeTable will call dropTable
to remove the
+ // underlying Lance dataset, so we don't need to do anything here.
+ if (!external) {
+ return purged;
+ }
+
+ Map<String, String> lanceStorageOptions =
+ LancePropertiesUtils.getLanceStorageOptions(table.properties());
// If the table metadata is purged successfully, we can delete the Lance
dataset.
// Otherwise, we should not delete the dataset.
if (purged) {
- // Delete the Lance dataset at the location
- Dataset.drop(location,
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+ // Delete the Lance dataset at the location, first we try to detect
whether the dataset
+ // still exists before the purge operation. If it does not exist, then
it means there is
+ // not need to call DataSet#drop to delete the dataset. If it still
exists, then we proceed
Review Comment:
Typo in comment: 'there is not need' should be 'there is no need'.
```suggestion
// no need to call DataSet#drop to delete the dataset. If it still
exists, then we proceed
```
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
try {
Table table = loadTable(ident);
String location = table.properties().get(Table.PROPERTY_LOCATION);
+ boolean external =
+ Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+ .map(Boolean::parseBoolean)
+ .orElse(false);
boolean purged = super.purgeTable(ident);
+ // If the table is a managed table, super.purgeTable will call dropTable
to remove the
+ // underlying Lance dataset, so we don't need to do anything here.
+ if (!external) {
+ return purged;
+ }
+
+ Map<String, String> lanceStorageOptions =
+ LancePropertiesUtils.getLanceStorageOptions(table.properties());
// If the table metadata is purged successfully, we can delete the Lance
dataset.
// Otherwise, we should not delete the dataset.
if (purged) {
- // Delete the Lance dataset at the location
- Dataset.drop(location,
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+ // Delete the Lance dataset at the location, first we try to detect
whether the dataset
+ // still exists before the purge operation. If it does not exist, then
it means there is
+ // not need to call DataSet#drop to delete the dataset. If it still
exists, then we proceed
+ // to delete the dataset.
+ ReadOptions readOptions =
+ new
ReadOptions.Builder().setStorageOptions(lanceStorageOptions).build();
+ try (Dataset ignored = Dataset.open(new RootAllocator(), location,
readOptions)) {
Review Comment:
Resource leak: The RootAllocator created here is never closed. RootAllocator
should be managed with try-with-resources or explicitly closed to prevent
memory leaks. The allocator is created solely for the Dataset.open call and
should be closed immediately after the try block completes or fails.
```suggestion
try (RootAllocator allocator = new RootAllocator();
Dataset ignored = Dataset.open(allocator, location,
readOptions)) {
```
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
try {
Table table = loadTable(ident);
String location = table.properties().get(Table.PROPERTY_LOCATION);
+ boolean external =
+ Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+ .map(Boolean::parseBoolean)
+ .orElse(false);
boolean purged = super.purgeTable(ident);
+ // If the table is a managed table, super.purgeTable will call dropTable
to remove the
+ // underlying Lance dataset, so we don't need to do anything here.
+ if (!external) {
+ return purged;
+ }
+
+ Map<String, String> lanceStorageOptions =
+ LancePropertiesUtils.getLanceStorageOptions(table.properties());
// If the table metadata is purged successfully, we can delete the Lance
dataset.
// Otherwise, we should not delete the dataset.
if (purged) {
- // Delete the Lance dataset at the location
- Dataset.drop(location,
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+ // Delete the Lance dataset at the location, first we try to detect
whether the dataset
+ // still exists before the purge operation. If it does not exist, then
it means there is
+ // not need to call DataSet#drop to delete the dataset. If it still
exists, then we proceed
+ // to delete the dataset.
Review Comment:
Misleading comment: The comment states 'first we try to detect whether the
dataset still exists before the purge operation', but this check happens AFTER
super.purgeTable() has already executed (line 199). The comment should clarify
that this checks whether the dataset exists before attempting to delete it with
Dataset.drop, not before the purge operation itself.
```suggestion
// Before deleting the Lance dataset at the location with
Dataset.drop, first try to
// detect whether the dataset still exists. If it does not exist,
then there is no need
// to call Dataset.drop to delete the dataset. If it still exists,
then proceed to delete
// the dataset.
```
##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/integration/test/CatalogGenericCatalogLanceIT.java:
##########
@@ -153,6 +153,84 @@ public void resetSchema() throws InterruptedException {
createSchema();
}
+ @Test
+ public void testPurgeTableWhenLocationMissing() throws IOException {
+ Column[] columns = createColumns();
+ String location = tempDirectory + "/" +
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+ NameIdentifier identifier =
+ NameIdentifier.of(schemaName,
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+ Map<String, String> properties = createProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+ properties.put(Table.PROPERTY_LOCATION, location);
+
+ catalog
+ .asTableCatalog()
+ .createTable(
+ identifier, columns, TABLE_COMMENT, properties,
Transforms.EMPTY_TRANSFORM, null, null);
+ Assertions.assertTrue(new File(location).exists());
+
+ // Simulate missing data before purge
+ FileUtils.deleteDirectory(new File(location));
+ Assertions.assertFalse(new File(location).exists());
+
+ Assertions.assertDoesNotThrow(() ->
catalog.asTableCatalog().purgeTable(identifier));
+ Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+ }
+
+ @Test
+ public void testPurgeTableWhenLocationNeverCreated() {
+ Column[] columns = createColumns();
+ String location = tempDirectory + "/" +
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+ NameIdentifier identifier =
+ NameIdentifier.of(schemaName,
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+ Map<String, String> properties = createProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+ properties.put(Table.PROPERTY_LOCATION, location);
+
+ catalog
+ .asTableCatalog()
+ .createTable(
+ identifier, columns, TABLE_COMMENT, properties,
Transforms.EMPTY_TRANSFORM, null, null);
+ // Normal case, the location should be created
+ Assertions.assertTrue(new File(location).exists());
+
+ Assertions.assertDoesNotThrow(() ->
catalog.asTableCatalog().purgeTable(identifier));
+ Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+ }
+
+ @Test
+ public void testPurgeManagedTableWithAutoLocation() {
+ Column[] columns = createColumns();
+ NameIdentifier identifier =
+ NameIdentifier.of(schemaName,
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+ String location = tempDirectory + "/" +
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+
+ Map<String, String> properties = createProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "false");
+ properties.put(Table.PROPERTY_LOCATION, location);
+
+ Table table =
+ catalog
+ .asTableCatalog()
+ .createTable(
+ identifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ Transforms.EMPTY_TRANSFORM,
+ null,
+ null);
+ String resolvedLocation = table.properties().get(Table.PROPERTY_LOCATION);
+ Assertions.assertNotNull(resolvedLocation);
+ Assertions.assertTrue(new File(resolvedLocation).exists());
+
+ Assertions.assertDoesNotThrow(() ->
catalog.asTableCatalog().purgeTable(identifier));
+ Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+ }
Review Comment:
Missing test scenario: This test should also verify the case where the
managed table's dataset location is deleted (or was never created via
createEmptyTable) before calling purgeTable. Currently, it only tests the happy
path. Add a test case that deletes the location directory before calling
purgeTable to ensure no exception is thrown.
##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/integration/test/CatalogGenericCatalogLanceIT.java:
##########
@@ -153,6 +153,84 @@ public void resetSchema() throws InterruptedException {
createSchema();
}
+ @Test
+ public void testPurgeTableWhenLocationMissing() throws IOException {
+ Column[] columns = createColumns();
+ String location = tempDirectory + "/" +
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+ NameIdentifier identifier =
+ NameIdentifier.of(schemaName,
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+ Map<String, String> properties = createProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+ properties.put(Table.PROPERTY_LOCATION, location);
+
+ catalog
+ .asTableCatalog()
+ .createTable(
+ identifier, columns, TABLE_COMMENT, properties,
Transforms.EMPTY_TRANSFORM, null, null);
+ Assertions.assertTrue(new File(location).exists());
+
+ // Simulate missing data before purge
+ FileUtils.deleteDirectory(new File(location));
+ Assertions.assertFalse(new File(location).exists());
+
+ Assertions.assertDoesNotThrow(() ->
catalog.asTableCatalog().purgeTable(identifier));
+ Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+ }
+
+ @Test
+ public void testPurgeTableWhenLocationNeverCreated() {
+ Column[] columns = createColumns();
+ String location = tempDirectory + "/" +
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+ NameIdentifier identifier =
+ NameIdentifier.of(schemaName,
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+ Map<String, String> properties = createProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+ properties.put(Table.PROPERTY_LOCATION, location);
+
+ catalog
+ .asTableCatalog()
+ .createTable(
+ identifier, columns, TABLE_COMMENT, properties,
Transforms.EMPTY_TRANSFORM, null, null);
+ // Normal case, the location should be created
+ Assertions.assertTrue(new File(location).exists());
+
+ Assertions.assertDoesNotThrow(() ->
catalog.asTableCatalog().purgeTable(identifier));
+ Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+ }
+
+ @Test
+ public void testPurgeManagedTableWithAutoLocation() {
+ Column[] columns = createColumns();
+ NameIdentifier identifier =
+ NameIdentifier.of(schemaName,
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+ String location = tempDirectory + "/" +
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+
+ Map<String, String> properties = createProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "false");
+ properties.put(Table.PROPERTY_LOCATION, location);
+
+ Table table =
+ catalog
+ .asTableCatalog()
+ .createTable(
+ identifier,
+ columns,
+ TABLE_COMMENT,
+ properties,
+ Transforms.EMPTY_TRANSFORM,
+ null,
+ null);
+ String resolvedLocation = table.properties().get(Table.PROPERTY_LOCATION);
+ Assertions.assertNotNull(resolvedLocation);
+ Assertions.assertTrue(new File(resolvedLocation).exists());
+
+ Assertions.assertDoesNotThrow(() ->
catalog.asTableCatalog().purgeTable(identifier));
+ Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+ }
Review Comment:
Incomplete test coverage: This test should also verify that the dataset
location is actually deleted after purging. Currently, it only checks that the
table metadata is removed (tableExists returns false), but doesn't verify that
the physical dataset directory at 'resolvedLocation' was cleaned up.
##########
catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/integration/test/CatalogGenericCatalogLanceIT.java:
##########
@@ -153,6 +153,84 @@ public void resetSchema() throws InterruptedException {
createSchema();
}
+ @Test
+ public void testPurgeTableWhenLocationMissing() throws IOException {
+ Column[] columns = createColumns();
+ String location = tempDirectory + "/" +
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+ NameIdentifier identifier =
+ NameIdentifier.of(schemaName,
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+ Map<String, String> properties = createProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+ properties.put(Table.PROPERTY_LOCATION, location);
+
+ catalog
+ .asTableCatalog()
+ .createTable(
+ identifier, columns, TABLE_COMMENT, properties,
Transforms.EMPTY_TRANSFORM, null, null);
+ Assertions.assertTrue(new File(location).exists());
+
+ // Simulate missing data before purge
+ FileUtils.deleteDirectory(new File(location));
+ Assertions.assertFalse(new File(location).exists());
+
+ Assertions.assertDoesNotThrow(() ->
catalog.asTableCatalog().purgeTable(identifier));
+ Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+ }
+
+ @Test
+ public void testPurgeTableWhenLocationNeverCreated() {
+ Column[] columns = createColumns();
+ String location = tempDirectory + "/" +
GravitinoITUtils.genRandomName(TABLE_PREFIX) + "/";
+ NameIdentifier identifier =
+ NameIdentifier.of(schemaName,
GravitinoITUtils.genRandomName(TABLE_PREFIX));
+ Map<String, String> properties = createProperties();
+ properties.put(Table.PROPERTY_TABLE_FORMAT, LANCE_TABLE_FORMAT);
+ properties.put(Table.PROPERTY_EXTERNAL, "true");
+ properties.put(Table.PROPERTY_LOCATION, location);
+
+ catalog
+ .asTableCatalog()
+ .createTable(
+ identifier, columns, TABLE_COMMENT, properties,
Transforms.EMPTY_TRANSFORM, null, null);
+ // Normal case, the location should be created
+ Assertions.assertTrue(new File(location).exists());
+
+ Assertions.assertDoesNotThrow(() ->
catalog.asTableCatalog().purgeTable(identifier));
+ Assertions.assertFalse(catalog.asTableCatalog().tableExists(identifier));
+ }
Review Comment:
Misleading test name: The test is named
'testPurgeTableWhenLocationNeverCreated' but the test actually creates the
location at line 197 (Assertions.assertTrue(new File(location).exists())). This
test appears to be testing the normal case where the location exists and should
be purged successfully. Consider renaming to something like
'testPurgeExternalTableWithExistingLocation' or removing this test if it
duplicates other coverage.
##########
lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java:
##########
@@ -899,7 +899,7 @@ void testDropTable() {
// Drop the table
DropTableRequest dropTableRequest = new DropTableRequest();
dropTableRequest.setId(ids);
Review Comment:
Test change may hide bugs: The original test expected dropTable to throw an
exception, but now it expects no exception. However, the test doesn't verify
WHY it was expected to throw before. If the original test was checking that
dropping an empty table (created via createEmptyTable) throws an exception
because the dataset was never created, then this change just masks the issue
without documenting the behavioral change. Consider adding a comment explaining
why the behavior changed from throwing to not throwing.
```suggestion
dropTableRequest.setId(ids);
// Historically, dropping a table created via createEmptyTable was
expected to throw
// if the underlying dataset had not been created yet. The Lance REST
API was updated
// to allow dropping such empty tables without error, so this test now
asserts that
// dropTable does not throw here and relies on the following checks
(describe/drop
// again returning 404) to verify that the table is actually removed.
```
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -190,13 +191,40 @@ public boolean purgeTable(NameIdentifier ident) {
try {
Table table = loadTable(ident);
String location = table.properties().get(Table.PROPERTY_LOCATION);
+ boolean external =
+ Optional.ofNullable(table.properties().get(Table.PROPERTY_EXTERNAL))
+ .map(Boolean::parseBoolean)
+ .orElse(false);
boolean purged = super.purgeTable(ident);
+ // If the table is a managed table, super.purgeTable will call dropTable
to remove the
+ // underlying Lance dataset, so we don't need to do anything here.
+ if (!external) {
+ return purged;
+ }
+
+ Map<String, String> lanceStorageOptions =
+ LancePropertiesUtils.getLanceStorageOptions(table.properties());
// If the table metadata is purged successfully, we can delete the Lance
dataset.
// Otherwise, we should not delete the dataset.
if (purged) {
- // Delete the Lance dataset at the location
- Dataset.drop(location,
LancePropertiesUtils.getLanceStorageOptions(table.properties()));
+ // Delete the Lance dataset at the location, first we try to detect
whether the dataset
+ // still exists before the purge operation. If it does not exist, then
it means there is
+ // not need to call DataSet#drop to delete the dataset. If it still
exists, then we proceed
+ // to delete the dataset.
+ ReadOptions readOptions =
+ new
ReadOptions.Builder().setStorageOptions(lanceStorageOptions).build();
+ try (Dataset ignored = Dataset.open(new RootAllocator(), location,
readOptions)) {
+ } catch (Exception e) {
+ LOG.warn(
+ "Failed to open Lance dataset at location {} with options {}, it
may have already been deleted. Skipping dataset deletion.",
+ location,
+ readOptions,
+ e);
+ return purged;
Review Comment:
This catch-all Exception handler may silently ignore legitimate errors that
should be propagated. The intent is to detect missing datasets, but this will
also catch and suppress other exceptions like permission errors, I/O errors, or
programming errors. Consider catching more specific exceptions (e.g.,
FileNotFoundException or whatever specific exception Lance throws for missing
datasets) and re-throwing unexpected exceptions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]