FANNG1 commented on PR #10498: URL: https://github.com/apache/gravitino/pull/10498#issuecomment-4118301045
I see two issues with the current approach: 1. The ETag returned by `CreateTableResponse` is not reusable for the default `loadTable` request. The Iceberg REST spec says the client may send the ETag received from `CreateTableResponse` or `LoadTableResponse` in `If-None-Match`. In this PR, `createTable` returns an ETag derived from `metadataLocation`, while `loadTable` compares against an ETag derived from `metadataLocation + snapshots`. For the default `snapshots=all` path, those values differ, so a client that follows the spec and reuses the ETag from create will not get `304 Not Modified` on a subsequent unchanged `loadTable`. 2. We may be leaving performance on the table by always calling `loadTable` before checking the ETag. If the backend catalog can expose the current `metadataLocation` cheaply, for example via something like `SupportsMetadataLocation`, we could compute/compare the ETag first and return `304` without loading full table metadata. That would make the optimization more effective for freshness-aware reads. This should stay as an optional fast path though: if the catalog does not support it, the server can fall back to the current `loadTable` flow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
