Copilot commented on code in PR #9127:
URL: https://github.com/apache/gravitino/pull/9127#discussion_r2671972514
##########
lance/lance-rest-server/src/main/java/org/apache/gravitino/lance/service/rest/LanceTableOperations.java:
##########
@@ -268,4 +316,14 @@ private void
validateDropTableRequest(@SuppressWarnings("unused") DropTableReque
// We will ignore the id in the request body since it's already provided
in the path param
// No specific fields to validate for now
}
+
+ private void validateDropColumnsRequest(AlterTableDropColumnsRequest
request) {
+ Preconditions.checkArgument(
+ !request.getColumns().isEmpty(), "Columns to drop cannot be empty.");
Review Comment:
The validation only checks if the columns list is not empty, but doesn't
validate if the columns list or its elements are null. If request.getColumns()
returns null, this will throw a NullPointerException. Consider adding a null
check before calling isEmpty().
```suggestion
request.getColumns() != null && !request.getColumns().isEmpty(),
"Columns to drop cannot be empty.");
```
##########
lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java:
##########
@@ -591,6 +597,101 @@ void testCreateTable() throws IOException, ApiException {
Set<String> stringSet = listResponse.getTables();
Assertions.assertEquals(1, stringSet.size());
Assertions.assertTrue(stringSet.contains(Joiner.on(".").join(ids)));
+
+ // Now try to drop columns in the table
+ AlterTableDropColumnsRequest dropColumnsRequest = new
AlterTableDropColumnsRequest();
+ dropColumnsRequest.setId(List.of(CATALOG_NAME, SCHEMA_NAME, "table"));
+ dropColumnsRequest.setColumns(List.of("value"));
+
+ // No alterTableDropColumns in Namespace interface, so we need to get
TableApi via reflection
+ RestNamespace restNamespace = (RestNamespace) ns;
+ TableApi tableApi = (TableApi) FieldUtils.readField(restNamespace,
"tableApi", true);
+ String delimiter = RestNamespaceConfig.DELIMITER_DEFAULT;
+
+ Assertions.assertDoesNotThrow(
+ () ->
+ tableApi.alterTableDropColumns(
+ String.join(delimiter, ids), dropColumnsRequest, delimiter));
+
+ describeTableRequest.setId(ids);
+ loadTable = ns.describeTable(describeTableRequest);
+ Assertions.assertNotNull(loadTable);
+ Assertions.assertEquals(newLocation, loadTable.getLocation());
+
+ jsonArrowFields = loadTable.getSchema().getFields();
+ Assertions.assertEquals(1, jsonArrowFields.size());
+ for (int i = 0; i < jsonArrowFields.size(); i++) {
+ JsonArrowField jsonArrowField = jsonArrowFields.get(i);
+ Field originalField = schema.getFields().get(i);
+ Assertions.assertEquals(originalField.getName(),
jsonArrowField.getName());
+
+ if (i == 0) {
+ Assertions.assertEquals("int32", jsonArrowField.getType().getType());
+ } else if (i == 1) {
+ Assertions.assertEquals("utf8", jsonArrowField.getType().getType());
+ }
+ }
+
+ // Drop a non-existing column should fail
+ AlterTableDropColumnsRequest dropNonExistingColumnsRequest = new
AlterTableDropColumnsRequest();
+ dropNonExistingColumnsRequest.setId(ids);
+ dropNonExistingColumnsRequest.setColumns(List.of("non_existing_column"));
+ Exception dropColumnException =
+ Assertions.assertThrows(
+ Exception.class,
+ () ->
+ tableApi.alterTableDropColumns(
+ String.join(delimiter, ids),
dropNonExistingColumnsRequest, delimiter));
+ Assertions.assertTrue(
+ dropColumnException
+ .getMessage()
+ .contains("Column non_existing_column does not exist in the
dataset"));
+ }
+
+ @Test
+ void testAlterColumns() throws Exception {
+ catalog = createCatalog(CATALOG_NAME);
+ createSchema();
+
+ String location = tempDir + "/" + "alter_columns/";
+ List<String> ids = List.of(CATALOG_NAME, SCHEMA_NAME,
"alter_columns_table");
+ org.apache.arrow.vector.types.pojo.Schema schema =
+ new org.apache.arrow.vector.types.pojo.Schema(
+ Arrays.asList(
+ Field.nullable("id", new ArrowType.Int(32, true)),
+ Field.nullable("value", new ArrowType.Utf8())));
+ byte[] body = ArrowUtils.generateIpcStream(schema);
+
+ CreateTableRequest request = new CreateTableRequest();
+ request.setId(ids);
+ request.setLocation(location);
+ request.setProperties(ImmutableMap.of("key1", "v1"));
+ ns.createTable(request, body);
+
+ RestNamespace restNamespace = (RestNamespace) ns;
+ TableApi tableApi = (TableApi) FieldUtils.readField(restNamespace,
"tableApi", true);
Review Comment:
The test uses reflection to access a private field "tableApi" in
RestNamespace. This creates a tight coupling to internal implementation details
and makes the test brittle. If the field name or structure changes, the test
will break. Consider either exposing the alterTableAlterColumns method through
the public Namespace interface, or using a dedicated test API client that
doesn't require reflection.
##########
lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java:
##########
@@ -591,6 +597,101 @@ void testCreateTable() throws IOException, ApiException {
Set<String> stringSet = listResponse.getTables();
Assertions.assertEquals(1, stringSet.size());
Assertions.assertTrue(stringSet.contains(Joiner.on(".").join(ids)));
+
+ // Now try to drop columns in the table
+ AlterTableDropColumnsRequest dropColumnsRequest = new
AlterTableDropColumnsRequest();
+ dropColumnsRequest.setId(List.of(CATALOG_NAME, SCHEMA_NAME, "table"));
+ dropColumnsRequest.setColumns(List.of("value"));
+
+ // No alterTableDropColumns in Namespace interface, so we need to get
TableApi via reflection
+ RestNamespace restNamespace = (RestNamespace) ns;
+ TableApi tableApi = (TableApi) FieldUtils.readField(restNamespace,
"tableApi", true);
Review Comment:
The test uses reflection to access a private field "tableApi" in
RestNamespace. This creates a tight coupling to internal implementation details
and makes the test brittle. If the field name or structure changes, the test
will break. Consider either exposing the alterTableDropColumns and
alterTableAlterColumns methods through the public Namespace interface, or using
a dedicated test API client that doesn't require reflection.
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -320,9 +334,19 @@ private void addLanceIndex(Table table, List<Index>
addedIndexes) {
indexParams,
true);
}
+
+ if (!dropColumns.isEmpty()) {
+ dataset.dropColumns(dropColumns);
+ }
+
+ if (!renameColumns.isEmpty()) {
+ dataset.alterColumns(renameColumns);
+ }
+ } catch (RuntimeException e) {
+ throw e;
} catch (Exception e) {
throw new RuntimeException(
- "Failed to add indexes to Lance dataset at location " + location, e);
+ "Failed to handle alterations to Lance dataset at location " +
location, e);
}
Review Comment:
The catch block re-throws RuntimeException as-is but wraps other exceptions.
This means the error message "Failed to handle alterations to Lance dataset at
location" will only be shown for checked exceptions, while RuntimeExceptions
will propagate with their original message. This inconsistency could make
debugging harder. Consider either wrapping all exceptions consistently or
differentiating them with more specific handling logic.
##########
lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceTableOperations.java:
##########
@@ -309,4 +373,28 @@ private JsonArrowSchema toJsonArrowSchema(Column[]
columns) {
return JsonArrowSchemaConverter.convertToJsonArrowSchema(
new org.apache.arrow.vector.types.pojo.Schema(fields));
}
+
+ private List<TableChange>
buildAlterColumnChanges(AlterTableAlterColumnsRequest request) {
+ List<ColumnAlteration> columns = request.getAlterations();
+
+ List<TableChange> changes = new ArrayList<>();
+ for (ColumnAlteration column : columns) {
+ // Column name will not be null according to LanceDB spec.
+ String columnName = column.getColumn();
+ String newName = column.getRename();
+ if (StringUtils.isNotBlank(newName)) {
+ changes.add(TableChange.renameColumn(new String[] {columnName},
newName));
+ }
+
+ // The format of ColumnAlteration#castTo is unclear, so we will skip it
now
+ // for more, please see:
+ //
https://github.com/lance-format/lance-namespace/blob/9d9cde12520caea2fd80ea5f41a20a4db9b92524/java/lance-namespace-apache-client/api/openapi.yaml#L4508-L4511
+ if (StringUtils.isNotBlank(column.getCastTo())) {
+ LOGGER.error(
+ "Altering column '{}' data type is not supported yet due to
unclear spec.", columnName);
+ throw new UnsupportedOperationException("Altering column data type is
not supported yet.");
+ }
+ }
+ return changes;
Review Comment:
The method buildAlterColumnChanges returns an empty list when a
ColumnAlteration has neither a rename operation nor a castTo operation. This
could indicate a silent no-op scenario where the caller expects changes to be
made but nothing happens. Consider validating that each ColumnAlteration has at
least one meaningful operation, or logging a warning when an alteration is
skipped.
##########
lance/lance-rest-server/src/main/java/org/apache/gravitino/lance/service/rest/LanceTableOperations.java:
##########
@@ -268,4 +316,14 @@ private void
validateDropTableRequest(@SuppressWarnings("unused") DropTableReque
// We will ignore the id in the request body since it's already provided
in the path param
// No specific fields to validate for now
}
+
+ private void validateDropColumnsRequest(AlterTableDropColumnsRequest
request) {
+ Preconditions.checkArgument(
+ !request.getColumns().isEmpty(), "Columns to drop cannot be empty.");
+ }
+
+ private void validateAlterColumnsRequest(AlterTableAlterColumnsRequest
request) {
+ Preconditions.checkArgument(
+ !request.getAlterations().isEmpty(), "Columns to alter cannot be
empty.");
Review Comment:
The validation only checks if the alterations list is not empty, but doesn't
validate if the alterations list or its elements are null. If
request.getAlterations() returns null, this will throw a NullPointerException.
Consider adding a null check before calling isEmpty().
##########
lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceTableOperations.java:
##########
@@ -284,6 +297,57 @@ public DropTableResponse dropTable(String tableId, String
delimiter) {
return response;
}
+ @Override
+ public AlterTableDropColumnsResponse alterTableDropColumns(
+ String tableId, String delimiter, AlterTableDropColumnsRequest request) {
+ ObjectIdentifier nsId = ObjectIdentifier.of(tableId,
Pattern.quote(delimiter));
+ Preconditions.checkArgument(
+ nsId.levels() == 3, "Expected at 3-level namespace but got: %s",
nsId.levels());
+
+ String catalogName = nsId.levelAtListPos(0);
+ Catalog catalog =
namespaceWrapper.loadAndValidateLakehouseCatalog(catalogName);
+
+ NameIdentifier tableIdentifier =
+ NameIdentifier.of(nsId.levelAtListPos(1), nsId.levelAtListPos(2));
+
+ TableChange[] changes =
+ request.getColumns().stream()
+ .map(colName -> TableChange.deleteColumn(new String[] {colName},
false))
+ .toArray(TableChange[]::new);
+
+ catalog.asTableCatalog().alterTable(tableIdentifier, changes);
+
+ return new AlterTableDropColumnsResponse();
Review Comment:
The response object is returned without setting a version field. Looking at
other methods in this class (like describeTable at line 106 and createTable at
line 161), responses typically have version set to null. For consistency with
the rest of the codebase, consider setting the version field explicitly on the
response object.
```suggestion
AlterTableDropColumnsResponse response = new
AlterTableDropColumnsResponse();
response.setVersion(null);
return response;
```
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -303,11 +287,41 @@ private org.apache.arrow.vector.types.pojo.Schema
convertColumnsToArrowSchema(Co
return new org.apache.arrow.vector.types.pojo.Schema(fields);
}
- private void addLanceIndex(Table table, List<Index> addedIndexes) {
+ // Note: this method can't guarantee the atomicity of the operations on
Lance dataset. For
+ // example, only a subset of changes may be applied if an exception occurs
during the process.
+ private void handleLanceTableChange(Table table, TableChange[] changes) {
+ List<String> dropColumns = Lists.newArrayList();
+ List<Index> indexToAdd = Lists.newArrayList();
+ List<ColumnAlteration> renameColumns = Lists.newArrayList();
+
+ for (TableChange change : changes) {
+ if (change instanceof TableChange.DeleteColumn deleteColumn) {
+ dropColumns.add(deleteColumn.fieldName()[0]);
Review Comment:
The code assumes fieldName() returns a non-empty array and directly accesses
index [0]. If fieldName() returns an empty array or null, this will throw an
ArrayIndexOutOfBoundsException or NullPointerException. Consider adding
validation to check that fieldName() is not null and has at least one element
before accessing the first element.
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -159,28 +161,10 @@ public Table createTable(
@Override
public Table alterTable(NameIdentifier ident, TableChange... changes)
throws NoSuchSchemaException, TableAlreadyExistsException {
- // Lance only supports adding indexes for now.
- boolean onlyAddIndex =
- Arrays.stream(changes).allMatch(change -> change instanceof
TableChange.AddIndex);
- Preconditions.checkArgument(onlyAddIndex, "Only adding indexes is
supported for Lance tables");
-
- List<Index> addedIndexes =
- Arrays.stream(changes)
- .filter(change -> change instanceof TableChange.AddIndex)
- .map(
- change -> {
- TableChange.AddIndex addIndexChange = (TableChange.AddIndex)
change;
- return Indexes.IndexImpl.builder()
- .withIndexType(addIndexChange.getType())
- .withName(addIndexChange.getName())
- .withFieldNames(addIndexChange.getFieldNames())
- .build();
- })
- .collect(Collectors.toList());
Table loadedTable = super.loadTable(ident);
- addLanceIndex(loadedTable, addedIndexes);
- // After adding the index to the Lance dataset, we need to update the
table metadata in
+ handleLanceTableChange(loadedTable, changes);
+ // After making changes to the Lance dataset, we need to update the table
metadata in
// Gravitino. If there's any failure during this process, the code will
throw an exception
// and the update won't be applied in Gravitino.
return super.alterTable(ident, changes);
Review Comment:
The new column drop and rename functionality introduced in
handleLanceTableChange is not covered by tests in the catalog-lakehouse-generic
module. The existing test file TestLanceTableOperations only tests creation
mode validation. Consider adding unit or integration tests that verify the drop
column and rename column operations work correctly through the alterTable
method, including error scenarios like attempting to drop a non-existent column
or rename to an invalid name.
##########
lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceTableOperations.java:
##########
@@ -284,6 +297,57 @@ public DropTableResponse dropTable(String tableId, String
delimiter) {
return response;
}
+ @Override
+ public AlterTableDropColumnsResponse alterTableDropColumns(
+ String tableId, String delimiter, AlterTableDropColumnsRequest request) {
+ ObjectIdentifier nsId = ObjectIdentifier.of(tableId,
Pattern.quote(delimiter));
+ Preconditions.checkArgument(
+ nsId.levels() == 3, "Expected at 3-level namespace but got: %s",
nsId.levels());
+
+ String catalogName = nsId.levelAtListPos(0);
+ Catalog catalog =
namespaceWrapper.loadAndValidateLakehouseCatalog(catalogName);
+
+ NameIdentifier tableIdentifier =
+ NameIdentifier.of(nsId.levelAtListPos(1), nsId.levelAtListPos(2));
+
+ TableChange[] changes =
+ request.getColumns().stream()
+ .map(colName -> TableChange.deleteColumn(new String[] {colName},
false))
+ .toArray(TableChange[]::new);
+
+ catalog.asTableCatalog().alterTable(tableIdentifier, changes);
+
+ return new AlterTableDropColumnsResponse();
+ }
+
+ @Override
+ public AlterTableAlterColumnsResponse alterTableAlterColumns(
+ String tableId, String delimiter, AlterTableAlterColumnsRequest request)
{
+ ObjectIdentifier nsId = ObjectIdentifier.of(tableId,
Pattern.quote(delimiter));
+ Preconditions.checkArgument(
+ nsId.levels() == 3, "Expected at 3-level namespace but got: %s",
nsId.levels());
+
+ String catalogName = nsId.levelAtListPos(0);
+ Catalog catalog =
namespaceWrapper.loadAndValidateLakehouseCatalog(catalogName);
+
+ NameIdentifier tableIdentifier =
+ NameIdentifier.of(nsId.levelAtListPos(1), nsId.levelAtListPos(2));
+
+ List<TableChange> changes = buildAlterColumnChanges(request);
+ catalog.asTableCatalog().alterTable(tableIdentifier, changes.toArray(new
TableChange[0]));
+
+ return new AlterTableAlterColumnsResponse();
Review Comment:
The response object is returned without setting a version field. Looking at
other methods in this class (like describeTable at line 106 and createTable at
line 161), responses typically have version set to null. For consistency with
the rest of the codebase, consider setting the version field explicitly on the
response object.
##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -303,11 +287,41 @@ private org.apache.arrow.vector.types.pojo.Schema
convertColumnsToArrowSchema(Co
return new org.apache.arrow.vector.types.pojo.Schema(fields);
}
- private void addLanceIndex(Table table, List<Index> addedIndexes) {
+ // Note: this method can't guarantee the atomicity of the operations on
Lance dataset. For
+ // example, only a subset of changes may be applied if an exception occurs
during the process.
+ private void handleLanceTableChange(Table table, TableChange[] changes) {
Review Comment:
The comment acknowledges that this method cannot guarantee atomicity of
operations on the Lance dataset. This is a significant concern as partial
application of changes could leave the dataset in an inconsistent state.
Consider implementing a transaction-like mechanism or rollback capability, or
at minimum, order the operations from least to most risky (e.g., rename before
drop) to minimize the impact of partial failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]