openinx commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r612904117
##########
File path: api/src/main/java/org/apache/iceberg/UpdateSchema.java
##########
@@ -384,4 +384,25 @@ default UpdateSchema updateColumn(String name,
Type.PrimitiveType newType, Strin
* with other additions, renames, or
updates.
*/
UpdateSchema unionByNameWith(Schema newSchema);
+
+ /**
+ * Add a new row identifier given the column name.
+ * <p>
+ * The column must be a required field of a primitive type.
+ * It must exist in the current schema to update, or is added as a part of
this update.
+ *
+ * @param columnName name of the column to add as a row identifier
+ * @return this for method chaining
+ */
+ UpdateSchema addRowIdentifier(String columnName);
Review comment:
Functionally, there should be no difference between the
`addIdentifierField`/`removeIdentifierField ` and `setIdentifierFields`, but
for the perspective of usability, if people want to reset to use a totally
different identifier fields then he/she will need to:
```java
Schema schema = ...
Set<Integer> newFieldIds = ...
UpdateSchema update = new SchemaUpdate(SCHEMA, SCHEMA_LAST_COLUMN_ID);
for (Integer identifier : schema.rowIdentifiers()) {
String columnName = schema.findColumnName(identifier);
Preconditions.checkNotNull(columnName, "Can not find the column with
field id %s", identifier);
update.deleteRowIdentifier(columnName);
}
for (Integer identifier : schema.rowIdentifiers()) {
String columnName = schema.findColumnName(identifier);
Preconditions.checkNotNull(columnName, "Can not find the column with
field id %s", identifier);
update.addRowIdentifier(columnName);
}
update.apply();
```
It does make it more difficult to use the API. Besides the
`setIdentifierFields` API, would you think it's necessary to introduce a
`Set<String> identifierColumns()` in `Schema` ? I find the schema will always
return the `fieldId` while the `UpdateSchema` will always use the `columnName`
then we have to do the conversion from `fieldId` to `columnName` everywhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]