mbutrovich commented on PR #2383:
URL: https://github.com/apache/iceberg-rust/pull/2383#issuecomment-4382066041

   Summary of reviewing with Claude:
   
   > ## Correctness
   > 
   > ### 1. Missing `KEY_TIMESTAMP` on a KEK should fail-fast, not silently 
pass `None` AAD
   > 
   > `encryption_manager.rs:500-503` (in `decrypt_dek`) and the symmetric wrap 
site (`:350-352`):
   > 
   > ```rust
   > let aad = kek.properties().get(KEK_CREATED_AT_PROPERTY).map(|ts| 
ts.as_bytes());
   > ```
   > 
   > Java asserts `Preconditions.checkState(keyEncryptionKeyTimestamp != null, 
"Key encryption key must be timestamped")` (`EncryptionUtil.java:161-162`). 
Here a KEK missing the timestamp silently passes `None` on both wrap and unwrap 
— the pair roundtrips, but the tampering defense is gone. And if one side has 
the property and the other doesn't, you get a generic AES-GCM auth failure 
instead of a clear "corrupted KEK" error.
   > 
   > Suggest: require the property in both paths and return 
`ErrorKind::DataInvalid` with a clear message if it's missing. `create_kek` 
always sets it, so the only way to hit this is a malformed table — which is 
exactly the case we want to surface loudly.
   > 
   > ### 2. Manager doesn't internalize newly created KEKs — deviates from the 
RFC's write flow
   > 
   > RFC §"Write Path", step 2.e: *"Store as EncryptedKey (encrypted_by_id = 
kek_id) **in encryption manager**."*
   > 
   > Today `wrap_key_metadata` takes `&self`, `encryption_keys` is frozen at 
builder time, and the new KEK is returned to the caller to persist. In the 
single-manifest-list-per-commit flow this is fine — but it does mean the RFC 
and implementation disagree.
   > 
   > Two reasonable resolutions:
   > - Update the RFC wording to reflect the current design (caller persists 
the returned new KEK) — I actually like this better, it's a cleaner ownership 
story than Java's hidden mutation.
   > - Or use interior mutability (e.g. `ArcSwap<HashMap<...>>` / 
`RwLock<...>`) so the manager can be reused across calls in one commit.
   > 
   > Either way, worth resolving with the RFC before this lands so `[6/N]` 
onward doesn't have to walk back either side.
   > 
   > ## API / naming
   > 
   > ### 3. `wrap_key_metadata` return type
   > 
   > The `Option<EncryptedKey>` in `(EncryptedKey, Option<EncryptedKey>)` has 
subtle "persist-this-or-lose-it" semantics that's easy to miss at call sites. A 
named struct makes it self-documenting:
   > 
   > ```rust
   > pub struct WrappedKeyMetadata {
   >     pub entry: EncryptedKey,
   >     pub new_kek: Option<EncryptedKey>, // Some only when a KEK was 
created/rotated
   > }
   > ```
   > 
   > ### 4. Public constant name
   > 
   > `KEK_CREATED_AT_PROPERTY` holds the value `"KEY_TIMESTAMP"` — which 
matches Java. But the Rust identifier diverges from Java's `KEY_TIMESTAMP` 
constant. Since this is public API, matching Java's identifier name makes 
cross-reading easier.
   > 
   > ### 5. `EncryptedInputFile` / `EncryptedOutputFile` are AGS1-only
   > 
   > Names suggest generality, docstring says "AGS1 stream-encrypted". Java 
disambiguates with `AesGcmInputFile` vs `NativeEncryptionInputFile`. Since the 
RFC also has PME/native paths coming, consider `AesGcmStreamInputFile` / 
`AesGcmStreamOutputFile` to leave room.
   > 
   > ### 6. RFC/impl naming drift
   > 
   > RFC §"EncryptionManager" lists `encrypt_native()` / `generate_dek()`; the 
PR has `generate_native_key_metadata()` (returning the full 
`StandardKeyMetadata`). Which way should the RFC move? Worth aligning in one 
place.
   > 
   > ## Behavioral divergences from Java (not spec-mandated, just worth a 
comment)
   > 
   > ### 7. `find_active_kek` selection order
   > 
   > `max_by_key(timestamp)` picks the newest unexpired KEK. Java picks the 
first unexpired one by `LinkedHashMap` insertion order 
(`StandardEncryptionManager.java:162-170`). The Rust choice is deterministic 
(HashMap iteration isn't, so `max_by_key` is necessary) and arguably more 
correct. A one-line comment explaining the intentional divergence would help.
   > 
   > ### 8. KEK `key_id` format
   > 
   > Rust: `Uuid::new_v4().to_string()`. Java: 16 random bytes, Base64-encoded 
(`generateKeyId()` at `StandardEncryptionManager.java:228-232`). Spec doesn't 
constrain format. Interop works since both sides just compare strings, but 
worth noting for tables that might be read by Java tooling.
   > 
   > ## Tests
   > 
   > ### 9. Missing coverage
   > 
   > - **AAD tampering**: mutate a KEK's `KEY_TIMESTAMP` property in table 
metadata, assert `unwrap_key_metadata` fails. This exercises the tampering 
defense the timestamp-as-AAD scheme is designed to catch.
   > - **Missing-timestamp rejection**: once #1 above is fixed, add a test 
asserting a KEK without `KEY_TIMESTAMP` fails wrap and unwrap with a clear 
error.
   > - **`wrap_key_metadata` twice on same manager with empty initial 
`encryption_keys`**: currently `test_kek_reuse_when_not_expired` constructs a 
second manager with the first KEK pre-populated, which is exactly the path that 
works. Testing the same-manager case would pin down whichever decision comes 
out of #2.
   > 
   > ## Nits
   > 
   > - `encryption_manager.rs:498` comment says *"The Iceberg Spec uses the KEK 
timestamp as AAD"*. The spec doesn't mandate this — it's a Java implementation 
choice that we're matching. Reword as "matches Java's wrap scheme" to avoid 
overclaiming the spec.
   > - `crypto.rs:180-186` — `TryFrom<SensitiveBytes> for SecureKey` is fine, 
but `wrap_dek_with_kek` / `unwrap_dek_with_kek` (`encryption_manager.rs:530, 
542`) clone the `SensitiveBytes` just to call `try_from`, duplicating zeroing 
key material into a second buffer. A `TryFrom<&SensitiveBytes>` impl (or a 
`&SensitiveBytes` `AsRef`-style path) would avoid the clone.
   > - `decrypt_dek` wraps the inner error with *"Failed to unwrap key metadata 
with KEK..."* — the caller (`unwrap_key_metadata`) is already named for this 
context, so the wrapper message is mostly redundant. Not worth changing on its 
own, but consider when you touch that error path.
   > - Confirm `uuid` was added to `Cargo.toml` as part of this PR (not visible 
in the diff).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to