Hugo-WB commented on code in PR #16353:
URL: https://github.com/apache/iceberg/pull/16353#discussion_r3430033839


##########
core/src/main/java/org/apache/iceberg/encryption/EncryptionUtil.java:
##########
@@ -181,6 +185,37 @@ public static Map<String, EncryptedKey> 
encryptionKeys(EncryptionManager em) {
     return sem.encryptionKeys();
   }
 
+  /**
+   * Returns {@link TableMetadata} with encrypted keys from an {@param 
encryptionManager} that are
+   * required to read the snapshots in the given {@param metadata}. Only adds 
keys that are still
+   * referenced in metadata.
+   */
+  public static TableMetadata addEmKeysToMetadata(
+      TableMetadata metadata, EncryptionManager encryptionManager) {
+    if (!(encryptionManager instanceof StandardEncryptionManager 
standardEncryptionManager)) {
+      return metadata;
+    }
+    Set<String> referencedEncryptionKeys =
+        Sets.union(
+            metadata.snapshots().stream()
+                .map(Snapshot::keyId)
+                .filter(Objects::nonNull)
+                .collect(Collectors.toUnmodifiableSet()),
+            metadata.encryptionKeys().stream()
+                .map(EncryptedKey::encryptedById)
+                .filter(Objects::nonNull)
+                .collect(Collectors.toUnmodifiableSet()));
+    TableMetadata.Builder metadataBuilder = TableMetadata.buildFrom(metadata);
+    standardEncryptionManager.encryptionKeys().values().stream()
+        .filter(
+            encryptedKey ->
+                referencedEncryptionKeys.contains(encryptedKey.keyId())
+                    // The KEK may not be referenced but must still be saved 
in the metadata.
+                    || 
standardEncryptionManager.keyEncryptionKeyID().equals(encryptedKey.keyId()))
+        .forEach(metadataBuilder::addEncryptionKey);
+    return metadataBuilder.build();

Review Comment:
   This new logic depends on an invariance that the `manifestListKey -> KEK -> 
table encryption key`. are all one "hop". E.g. the `->` means encryptedBy.
   
   If there was a world where a key in the encryptionKeys() list is encrypted 
by another key in the encryptionKey list, which is then in itself encrypted by 
the KEK, and they were all added in a single commit, this PR would fail to 
propagate the middle key.
   E.g.:
   ```
   EncryptionKeys:[{key: A, encryptedBy: B}, {key: B, encryptedBy: C}, {key: C, 
encryptedBy: KEK}]
   snapshot: [snapshotId: x, keyId: A]
   ```
   And all of the above happens within one commit.
   In the above code, we would not add key B to the metadata file. As none of 
the encryptionKeys will be part of `metadata.encryptionKeys()` as that will be 
empty. We will keep A and C, since A is referenced by snapshot and C is a KEK.
   
   TLDR: question @ggershinsky : we expect the depth of encryption key 
referencing from manifest list -> KEK to remain at a single hop right, we don't 
expect there to be another hop?



##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -245,18 +244,7 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 
     String newMetadataLocation;
     EncryptionManager encrManager = encryption();
-    if (encrManager instanceof StandardEncryptionManager) {
-      // Add new encryption keys to the metadata
-      TableMetadata.Builder builder = TableMetadata.buildFrom(metadata);
-      for (Map.Entry<String, EncryptedKey> entry :
-          EncryptionUtil.encryptionKeys(encrManager).entrySet()) {
-        builder.addEncryptionKey(entry.getValue());
-      }
-
-      tableMetadata = builder.build();
-    } else {
-      tableMetadata = metadata;
-    }
+    tableMetadata = EncryptionUtil.addEmKeysToMetadata(metadata, encrManager);

Review Comment:
   cc @smaheshwar-pltr for: https://github.com/apache/iceberg/pull/13225. If 
you could update your REST implementation to use this utility once it merges.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to