flyrain commented on code in PR #1037:
URL: https://github.com/apache/polaris/pull/1037#discussion_r2011020421
##########
service/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java:
##########
@@ -1197,20 +1213,23 @@ public void doRefresh() {
PolarisResolvedPathWrapper resolvedEntities =
resolvedEntityView.getPassthroughResolvedPath(
tableIdentifier, PolarisEntityType.ICEBERG_TABLE_LIKE,
PolarisEntitySubType.TABLE);
- IcebergTableLikeEntity entity = null;
if (resolvedEntities != null) {
- entity =
IcebergTableLikeEntity.of(resolvedEntities.getRawLeafEntity());
- if (!tableIdentifier.equals(entity.getTableIdentifier())) {
+ this.tableEntity =
IcebergTableLikeEntity.of(resolvedEntities.getRawLeafEntity());
+ if (!tableIdentifier.equals(tableEntity.getTableIdentifier())) {
LOGGER
.atError()
- .addKeyValue("entity.getTableIdentifier()",
entity.getTableIdentifier())
+ .addKeyValue("entity.getTableIdentifier()",
tableEntity.getTableIdentifier())
.addKeyValue("tableIdentifier", tableIdentifier)
.log("Stored table identifier mismatches requested identifier");
Review Comment:
Should we throw or return here? The tableEntity isn't correct in this case,
we should not move forward with it. Not a blocker though, the logic wasn't
introduced by this PR.
##########
service/common/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java:
##########
@@ -1374,9 +1393,9 @@ public void doCommit(TableMetadata base, TableMetadata
metadata) {
tableIdentifier, oldLocation, newLocation, existingLocation);
}
if (null == existingLocation) {
- createTableLike(tableIdentifier, entity);
+ this.tableEntity = createTableLike(tableIdentifier, entity);
} else {
- updateTableLike(tableIdentifier, entity);
+ this.tableEntity = updateTableLike(tableIdentifier, entity);
Review Comment:
I share the same concern as @eric-maynard — it's risky when the `etag` gets
ahead of `current`. This creates a false sense for the client that it's
operating on the current metadata, when in fact it's not. The mismatch between
`IcebergTableLikeEntity` and the actual table metadata drives me a bit crazy.
The current logic feels fragile: even if we verify there's no concurrency issue
now, it's error-prone and susceptible to future regressions if assumptions
change.
I'd strongly recommend updating `BasePolarisTableOperations` to return
`etag`-related properties alongside the current metadata, and clearly document
that these must *never* be separated. The best place to enforce this would be
within the method `refreshFromMetadataLocation()` or its parameter
`metadataLoader`.
One potential solution is to compute a hash based on the Iceberg table
metadata properties, such as:
```json
{
"last-sequence-number": 34,
"last-updated-ms": 1602638573590,
"current-snapshot-id": 3055729675574597004
}
```
Alternatively, if we really want to leverage the Polaris entity ID and
version, we could:
1. Inject them into the `"properties"` map and strip them out when unpacking
(though we'd need to carefully pick a key).
2. Introduce a wrapper that bundles `TableMetadata` and `PolarisEntity`
together, ensuring consistency. This would require more extensive refactoring
though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]