smaheshwar-pltr commented on code in PR #13225:
URL: https://github.com/apache/iceberg/pull/13225#discussion_r2926614176
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -575,6 +584,7 @@ private Supplier<BaseTable> createTableSupplier(
Map::of,
mutationHeaders,
tableFileIO(context, tableConf, credentials),
+ keyManagementClient,
Review Comment:
Thanks for bringing this up!
Don't have all the context here, but some initial thoughts:
1. `key-metadata` is part of the `ContentFile` schema in the REST spec
https://github.com/apache/iceberg/blob/f865bac7c15dada3543015b993b633760579ad88/open-api/rest-catalog-open-api.yaml#L4493
so `FileScanTasks` returned by server-side scan planning _can_ carry the
encryption metadata needed for clients to decrypt data files. See
debugging from https://github.com/apache/iceberg/pull/15603:
<img width="1728" height="1095" alt="image"
src="https://github.com/user-attachments/assets/e5547c87-3e5c-4276-b264-2038031dd48a"
/>
2. This does require that REST implementations obtain the key metadata for
data files that clients have written, which due to the nature of enveloping
encryption may limit custom server-side optimisations
3. `fileIOForPlanId` catches my eye
https://github.com/apache/iceberg/blob/f865bac7c15dada3543015b993b633760579ad88/core/src/main/java/org/apache/iceberg/rest/RESTTableScan.java#L215
but thinking more I actually suspect that when executors read data files, they
don't use per-plan file IO but instead the table's IO via `SerializableTable`
which maybe feels odd if so. WDYT?
4. I think testing is a bit involved as it requires the server-side catalog
to support encryption (or some custom interception to return key metadata in
scan tasks). I think it's doable though and tests should pass without changes -
see https://github.com/apache/iceberg/pull/15603
(`TestRemoteScanPlanningWithEncryption`) for a proof of concept passing test
suite (I threw together JDBC encryption for the POC).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]