Copilot commented on code in PR #9127:
URL: https://github.com/apache/gravitino/pull/9127#discussion_r2671972514


##########
lance/lance-rest-server/src/main/java/org/apache/gravitino/lance/service/rest/LanceTableOperations.java:
##########
@@ -268,4 +316,14 @@ private void 
validateDropTableRequest(@SuppressWarnings("unused") DropTableReque
     // We will ignore the id in the request body since it's already provided 
in the path param
     // No specific fields to validate for now
   }
+
+  private void validateDropColumnsRequest(AlterTableDropColumnsRequest 
request) {
+    Preconditions.checkArgument(
+        !request.getColumns().isEmpty(), "Columns to drop cannot be empty.");

Review Comment:
   The validation only checks if the columns list is not empty, but doesn't 
validate if the columns list or its elements are null. If request.getColumns() 
returns null, this will throw a NullPointerException. Consider adding a null 
check before calling isEmpty().
   ```suggestion
           request.getColumns() != null && !request.getColumns().isEmpty(),
           "Columns to drop cannot be empty.");
   ```



##########
lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java:
##########
@@ -591,6 +597,101 @@ void testCreateTable() throws IOException, ApiException {
     Set<String> stringSet = listResponse.getTables();
     Assertions.assertEquals(1, stringSet.size());
     Assertions.assertTrue(stringSet.contains(Joiner.on(".").join(ids)));
+
+    // Now try to drop columns in the table
+    AlterTableDropColumnsRequest dropColumnsRequest = new 
AlterTableDropColumnsRequest();
+    dropColumnsRequest.setId(List.of(CATALOG_NAME, SCHEMA_NAME, "table"));
+    dropColumnsRequest.setColumns(List.of("value"));
+
+    // No alterTableDropColumns in Namespace interface, so we need to get 
TableApi via reflection
+    RestNamespace restNamespace = (RestNamespace) ns;
+    TableApi tableApi = (TableApi) FieldUtils.readField(restNamespace, 
"tableApi", true);
+    String delimiter = RestNamespaceConfig.DELIMITER_DEFAULT;
+
+    Assertions.assertDoesNotThrow(
+        () ->
+            tableApi.alterTableDropColumns(
+                String.join(delimiter, ids), dropColumnsRequest, delimiter));
+
+    describeTableRequest.setId(ids);
+    loadTable = ns.describeTable(describeTableRequest);
+    Assertions.assertNotNull(loadTable);
+    Assertions.assertEquals(newLocation, loadTable.getLocation());
+
+    jsonArrowFields = loadTable.getSchema().getFields();
+    Assertions.assertEquals(1, jsonArrowFields.size());
+    for (int i = 0; i < jsonArrowFields.size(); i++) {
+      JsonArrowField jsonArrowField = jsonArrowFields.get(i);
+      Field originalField = schema.getFields().get(i);
+      Assertions.assertEquals(originalField.getName(), 
jsonArrowField.getName());
+
+      if (i == 0) {
+        Assertions.assertEquals("int32", jsonArrowField.getType().getType());
+      } else if (i == 1) {
+        Assertions.assertEquals("utf8", jsonArrowField.getType().getType());
+      }
+    }
+
+    // Drop a non-existing column should fail
+    AlterTableDropColumnsRequest dropNonExistingColumnsRequest = new 
AlterTableDropColumnsRequest();
+    dropNonExistingColumnsRequest.setId(ids);
+    dropNonExistingColumnsRequest.setColumns(List.of("non_existing_column"));
+    Exception dropColumnException =
+        Assertions.assertThrows(
+            Exception.class,
+            () ->
+                tableApi.alterTableDropColumns(
+                    String.join(delimiter, ids), 
dropNonExistingColumnsRequest, delimiter));
+    Assertions.assertTrue(
+        dropColumnException
+            .getMessage()
+            .contains("Column non_existing_column does not exist in the 
dataset"));
+  }
+
+  @Test
+  void testAlterColumns() throws Exception {
+    catalog = createCatalog(CATALOG_NAME);
+    createSchema();
+
+    String location = tempDir + "/" + "alter_columns/";
+    List<String> ids = List.of(CATALOG_NAME, SCHEMA_NAME, 
"alter_columns_table");
+    org.apache.arrow.vector.types.pojo.Schema schema =
+        new org.apache.arrow.vector.types.pojo.Schema(
+            Arrays.asList(
+                Field.nullable("id", new ArrowType.Int(32, true)),
+                Field.nullable("value", new ArrowType.Utf8())));
+    byte[] body = ArrowUtils.generateIpcStream(schema);
+
+    CreateTableRequest request = new CreateTableRequest();
+    request.setId(ids);
+    request.setLocation(location);
+    request.setProperties(ImmutableMap.of("key1", "v1"));
+    ns.createTable(request, body);
+
+    RestNamespace restNamespace = (RestNamespace) ns;
+    TableApi tableApi = (TableApi) FieldUtils.readField(restNamespace, 
"tableApi", true);

Review Comment:
   The test uses reflection to access a private field "tableApi" in 
RestNamespace. This creates a tight coupling to internal implementation details 
and makes the test brittle. If the field name or structure changes, the test 
will break. Consider either exposing the alterTableAlterColumns method through 
the public Namespace interface, or using a dedicated test API client that 
doesn't require reflection.



##########
lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java:
##########
@@ -591,6 +597,101 @@ void testCreateTable() throws IOException, ApiException {
     Set<String> stringSet = listResponse.getTables();
     Assertions.assertEquals(1, stringSet.size());
     Assertions.assertTrue(stringSet.contains(Joiner.on(".").join(ids)));
+
+    // Now try to drop columns in the table
+    AlterTableDropColumnsRequest dropColumnsRequest = new 
AlterTableDropColumnsRequest();
+    dropColumnsRequest.setId(List.of(CATALOG_NAME, SCHEMA_NAME, "table"));
+    dropColumnsRequest.setColumns(List.of("value"));
+
+    // No alterTableDropColumns in Namespace interface, so we need to get 
TableApi via reflection
+    RestNamespace restNamespace = (RestNamespace) ns;
+    TableApi tableApi = (TableApi) FieldUtils.readField(restNamespace, 
"tableApi", true);

Review Comment:
   The test uses reflection to access a private field "tableApi" in 
RestNamespace. This creates a tight coupling to internal implementation details 
and makes the test brittle. If the field name or structure changes, the test 
will break. Consider either exposing the alterTableDropColumns and 
alterTableAlterColumns methods through the public Namespace interface, or using 
a dedicated test API client that doesn't require reflection.



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -320,9 +334,19 @@ private void addLanceIndex(Table table, List<Index> 
addedIndexes) {
             indexParams,
             true);
       }
+
+      if (!dropColumns.isEmpty()) {
+        dataset.dropColumns(dropColumns);
+      }
+
+      if (!renameColumns.isEmpty()) {
+        dataset.alterColumns(renameColumns);
+      }
+    } catch (RuntimeException e) {
+      throw e;
     } catch (Exception e) {
       throw new RuntimeException(
-          "Failed to add indexes to Lance dataset at location " + location, e);
+          "Failed to handle alterations to Lance dataset at location " + 
location, e);
     }

Review Comment:
   The catch block re-throws RuntimeException as-is but wraps other exceptions. 
This means the error message "Failed to handle alterations to Lance dataset at 
location" will only be shown for checked exceptions, while RuntimeExceptions 
will propagate with their original message. This inconsistency could make 
debugging harder. Consider either wrapping all exceptions consistently or 
differentiating them with more specific handling logic.



##########
lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceTableOperations.java:
##########
@@ -309,4 +373,28 @@ private JsonArrowSchema toJsonArrowSchema(Column[] 
columns) {
     return JsonArrowSchemaConverter.convertToJsonArrowSchema(
         new org.apache.arrow.vector.types.pojo.Schema(fields));
   }
+
+  private List<TableChange> 
buildAlterColumnChanges(AlterTableAlterColumnsRequest request) {
+    List<ColumnAlteration> columns = request.getAlterations();
+
+    List<TableChange> changes = new ArrayList<>();
+    for (ColumnAlteration column : columns) {
+      // Column name will not be null according to LanceDB spec.
+      String columnName = column.getColumn();
+      String newName = column.getRename();
+      if (StringUtils.isNotBlank(newName)) {
+        changes.add(TableChange.renameColumn(new String[] {columnName}, 
newName));
+      }
+
+      // The format of ColumnAlteration#castTo is unclear, so we will skip it 
now
+      // for more, please see:
+      // 
https://github.com/lance-format/lance-namespace/blob/9d9cde12520caea2fd80ea5f41a20a4db9b92524/java/lance-namespace-apache-client/api/openapi.yaml#L4508-L4511
+      if (StringUtils.isNotBlank(column.getCastTo())) {
+        LOGGER.error(
+            "Altering column '{}' data type is not supported yet due to 
unclear spec.", columnName);
+        throw new UnsupportedOperationException("Altering column data type is 
not supported yet.");
+      }
+    }
+    return changes;

Review Comment:
   The method buildAlterColumnChanges returns an empty list when a 
ColumnAlteration has neither a rename operation nor a castTo operation. This 
could indicate a silent no-op scenario where the caller expects changes to be 
made but nothing happens. Consider validating that each ColumnAlteration has at 
least one meaningful operation, or logging a warning when an alteration is 
skipped.



##########
lance/lance-rest-server/src/main/java/org/apache/gravitino/lance/service/rest/LanceTableOperations.java:
##########
@@ -268,4 +316,14 @@ private void 
validateDropTableRequest(@SuppressWarnings("unused") DropTableReque
     // We will ignore the id in the request body since it's already provided 
in the path param
     // No specific fields to validate for now
   }
+
+  private void validateDropColumnsRequest(AlterTableDropColumnsRequest 
request) {
+    Preconditions.checkArgument(
+        !request.getColumns().isEmpty(), "Columns to drop cannot be empty.");
+  }
+
+  private void validateAlterColumnsRequest(AlterTableAlterColumnsRequest 
request) {
+    Preconditions.checkArgument(
+        !request.getAlterations().isEmpty(), "Columns to alter cannot be 
empty.");

Review Comment:
   The validation only checks if the alterations list is not empty, but doesn't 
validate if the alterations list or its elements are null. If 
request.getAlterations() returns null, this will throw a NullPointerException. 
Consider adding a null check before calling isEmpty().



##########
lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceTableOperations.java:
##########
@@ -284,6 +297,57 @@ public DropTableResponse dropTable(String tableId, String 
delimiter) {
     return response;
   }
 
+  @Override
+  public AlterTableDropColumnsResponse alterTableDropColumns(
+      String tableId, String delimiter, AlterTableDropColumnsRequest request) {
+    ObjectIdentifier nsId = ObjectIdentifier.of(tableId, 
Pattern.quote(delimiter));
+    Preconditions.checkArgument(
+        nsId.levels() == 3, "Expected at 3-level namespace but got: %s", 
nsId.levels());
+
+    String catalogName = nsId.levelAtListPos(0);
+    Catalog catalog = 
namespaceWrapper.loadAndValidateLakehouseCatalog(catalogName);
+
+    NameIdentifier tableIdentifier =
+        NameIdentifier.of(nsId.levelAtListPos(1), nsId.levelAtListPos(2));
+
+    TableChange[] changes =
+        request.getColumns().stream()
+            .map(colName -> TableChange.deleteColumn(new String[] {colName}, 
false))
+            .toArray(TableChange[]::new);
+
+    catalog.asTableCatalog().alterTable(tableIdentifier, changes);
+
+    return new AlterTableDropColumnsResponse();

Review Comment:
   The response object is returned without setting a version field. Looking at 
other methods in this class (like describeTable at line 106 and createTable at 
line 161), responses typically have version set to null. For consistency with 
the rest of the codebase, consider setting the version field explicitly on the 
response object.
   ```suggestion
       AlterTableDropColumnsResponse response = new 
AlterTableDropColumnsResponse();
       response.setVersion(null);
       return response;
   ```



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -303,11 +287,41 @@ private org.apache.arrow.vector.types.pojo.Schema 
convertColumnsToArrowSchema(Co
     return new org.apache.arrow.vector.types.pojo.Schema(fields);
   }
 
-  private void addLanceIndex(Table table, List<Index> addedIndexes) {
+  // Note: this method can't guarantee the atomicity of the operations on 
Lance dataset. For
+  // example, only a subset of changes may be applied if an exception occurs 
during the process.
+  private void handleLanceTableChange(Table table, TableChange[] changes) {
+    List<String> dropColumns = Lists.newArrayList();
+    List<Index> indexToAdd = Lists.newArrayList();
+    List<ColumnAlteration> renameColumns = Lists.newArrayList();
+
+    for (TableChange change : changes) {
+      if (change instanceof TableChange.DeleteColumn deleteColumn) {
+        dropColumns.add(deleteColumn.fieldName()[0]);

Review Comment:
   The code assumes fieldName() returns a non-empty array and directly accesses 
index [0]. If fieldName() returns an empty array or null, this will throw an 
ArrayIndexOutOfBoundsException or NullPointerException. Consider adding 
validation to check that fieldName() is not null and has at least one element 
before accessing the first element.



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -159,28 +161,10 @@ public Table createTable(
   @Override
   public Table alterTable(NameIdentifier ident, TableChange... changes)
       throws NoSuchSchemaException, TableAlreadyExistsException {
-    // Lance only supports adding indexes for now.
-    boolean onlyAddIndex =
-        Arrays.stream(changes).allMatch(change -> change instanceof 
TableChange.AddIndex);
-    Preconditions.checkArgument(onlyAddIndex, "Only adding indexes is 
supported for Lance tables");
-
-    List<Index> addedIndexes =
-        Arrays.stream(changes)
-            .filter(change -> change instanceof TableChange.AddIndex)
-            .map(
-                change -> {
-                  TableChange.AddIndex addIndexChange = (TableChange.AddIndex) 
change;
-                  return Indexes.IndexImpl.builder()
-                      .withIndexType(addIndexChange.getType())
-                      .withName(addIndexChange.getName())
-                      .withFieldNames(addIndexChange.getFieldNames())
-                      .build();
-                })
-            .collect(Collectors.toList());
 
     Table loadedTable = super.loadTable(ident);
-    addLanceIndex(loadedTable, addedIndexes);
-    // After adding the index to the Lance dataset, we need to update the 
table metadata in
+    handleLanceTableChange(loadedTable, changes);
+    // After making changes to the Lance dataset, we need to update the table 
metadata in
     // Gravitino. If there's any failure during this process, the code will 
throw an exception
     // and the update won't be applied in Gravitino.
     return super.alterTable(ident, changes);

Review Comment:
   The new column drop and rename functionality introduced in 
handleLanceTableChange is not covered by tests in the catalog-lakehouse-generic 
module. The existing test file TestLanceTableOperations only tests creation 
mode validation. Consider adding unit or integration tests that verify the drop 
column and rename column operations work correctly through the alterTable 
method, including error scenarios like attempting to drop a non-existent column 
or rename to an invalid name.



##########
lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceTableOperations.java:
##########
@@ -284,6 +297,57 @@ public DropTableResponse dropTable(String tableId, String 
delimiter) {
     return response;
   }
 
+  @Override
+  public AlterTableDropColumnsResponse alterTableDropColumns(
+      String tableId, String delimiter, AlterTableDropColumnsRequest request) {
+    ObjectIdentifier nsId = ObjectIdentifier.of(tableId, 
Pattern.quote(delimiter));
+    Preconditions.checkArgument(
+        nsId.levels() == 3, "Expected at 3-level namespace but got: %s", 
nsId.levels());
+
+    String catalogName = nsId.levelAtListPos(0);
+    Catalog catalog = 
namespaceWrapper.loadAndValidateLakehouseCatalog(catalogName);
+
+    NameIdentifier tableIdentifier =
+        NameIdentifier.of(nsId.levelAtListPos(1), nsId.levelAtListPos(2));
+
+    TableChange[] changes =
+        request.getColumns().stream()
+            .map(colName -> TableChange.deleteColumn(new String[] {colName}, 
false))
+            .toArray(TableChange[]::new);
+
+    catalog.asTableCatalog().alterTable(tableIdentifier, changes);
+
+    return new AlterTableDropColumnsResponse();
+  }
+
+  @Override
+  public AlterTableAlterColumnsResponse alterTableAlterColumns(
+      String tableId, String delimiter, AlterTableAlterColumnsRequest request) 
{
+    ObjectIdentifier nsId = ObjectIdentifier.of(tableId, 
Pattern.quote(delimiter));
+    Preconditions.checkArgument(
+        nsId.levels() == 3, "Expected at 3-level namespace but got: %s", 
nsId.levels());
+
+    String catalogName = nsId.levelAtListPos(0);
+    Catalog catalog = 
namespaceWrapper.loadAndValidateLakehouseCatalog(catalogName);
+
+    NameIdentifier tableIdentifier =
+        NameIdentifier.of(nsId.levelAtListPos(1), nsId.levelAtListPos(2));
+
+    List<TableChange> changes = buildAlterColumnChanges(request);
+    catalog.asTableCatalog().alterTable(tableIdentifier, changes.toArray(new 
TableChange[0]));
+
+    return new AlterTableAlterColumnsResponse();

Review Comment:
   The response object is returned without setting a version field. Looking at 
other methods in this class (like describeTable at line 106 and createTable at 
line 161), responses typically have version set to null. For consistency with 
the rest of the codebase, consider setting the version field explicitly on the 
response object.



##########
catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java:
##########
@@ -303,11 +287,41 @@ private org.apache.arrow.vector.types.pojo.Schema 
convertColumnsToArrowSchema(Co
     return new org.apache.arrow.vector.types.pojo.Schema(fields);
   }
 
-  private void addLanceIndex(Table table, List<Index> addedIndexes) {
+  // Note: this method can't guarantee the atomicity of the operations on 
Lance dataset. For
+  // example, only a subset of changes may be applied if an exception occurs 
during the process.
+  private void handleLanceTableChange(Table table, TableChange[] changes) {

Review Comment:
   The comment acknowledges that this method cannot guarantee atomicity of 
operations on the Lance dataset. This is a significant concern as partial 
application of changes could leave the dataset in an inconsistent state. 
Consider implementing a transaction-like mechanism or rollback capability, or 
at minimum, order the operations from least to most risky (e.g., rename before 
drop) to minimize the impact of partial failures.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to