ggershinsky commented on code in PR #16527:
URL: https://github.com/apache/iceberg/pull/16527#discussion_r3303685254


##########
format/spec.md:
##########
@@ -1299,6 +1066,49 @@ Notes:
 
 1. The format of encrypted key metadata is determined by the table's 
encryption scheme and can be a wrapped format specific to the table's KMS 
provider.
 
+#### Standard Key Metadata
+
+The `key_metadata` field in manifest entries stores per-file encryption key 
material as a binary blob. To enable cross-implementation interoperability, the 
standard encryption scheme defines the following binary format for this field:
+
+```
+VersionByte Payload
+```
+
+where:
+
+* `VersionByte` is a single byte indicating the key metadata schema version. 
Currently, the only valid version is `0x01`.
+* `Payload` is an Avro binary-encoded record (not a container file — only the 
raw binary encoding of the fields) using the schema for the given version.
+
+The Avro schema for version 1 is a record with the following fields, in order:
+
+| Field name | Avro type | Required | Description |
+|---|---|---|---|
+| **`encryption_key`** | `bytes` | _required_ | The data encryption key (DEK) 
for this file. Must be 16, 24, or 32 bytes (corresponding to AES-128, AES-192, 
or AES-256). |
+| **`aad_prefix`** | `bytes` | _optional_ | Random AAD prefix used for 
encryption integrity protection. For [AES GCM Stream](gcm-stream-spec.md) 
files, the prefix is combined with a block index to form the per-block AAD. For 
[Parquet modular 
encryption](https://parquet.apache.org/docs/file-format/data-pages/encryption/),
 the prefix is passed as the `aad_file_unique` component. |

Review Comment:
   For Parquet, it also passed as an AAD Prefix parameter; it then combined 
with AAD Suffixes.
   `aad_file_unique` is something else (a Parquet-internal parameter)



##########
format/spec.md:
##########
@@ -1299,6 +1066,49 @@ Notes:
 
 1. The format of encrypted key metadata is determined by the table's 
encryption scheme and can be a wrapped format specific to the table's KMS 
provider.
 
+#### Standard Key Metadata
+
+The `key_metadata` field in manifest entries stores per-file encryption key 
material as a binary blob. To enable cross-implementation interoperability, the 
standard encryption scheme defines the following binary format for this field:
+
+```
+VersionByte Payload
+```
+
+where:
+
+* `VersionByte` is a single byte indicating the key metadata schema version. 
Currently, the only valid version is `0x01`.
+* `Payload` is an Avro binary-encoded record (not a container file — only the 
raw binary encoding of the fields) using the schema for the given version.
+
+The Avro schema for version 1 is a record with the following fields, in order:
+
+| Field name | Avro type | Required | Description |
+|---|---|---|---|
+| **`encryption_key`** | `bytes` | _required_ | The data encryption key (DEK) 
for this file. Must be 16, 24, or 32 bytes (corresponding to AES-128, AES-192, 
or AES-256). |
+| **`aad_prefix`** | `bytes` | _optional_ | Random AAD prefix used for 
encryption integrity protection. For [AES GCM Stream](gcm-stream-spec.md) 
files, the prefix is combined with a block index to form the per-block AAD. For 
[Parquet modular 
encryption](https://parquet.apache.org/docs/file-format/data-pages/encryption/),
 the prefix is passed as the `aad_file_unique` component. |
+| **`file_length`** | `long` | _optional_ | The encrypted file length in 
bytes. Required for [AES GCM Stream](gcm-stream-spec.md) encrypted files to 
detect truncation attacks (see [AES GCM Stream file 
length](gcm-stream-spec.md#file-length)). Not set for Parquet encrypted files. |
+
+The usage of the `encryption_key` and `aad_prefix` fields depends on the file 
format:
+
+* **AES GCM Stream files** (manifest lists, manifests, and non-Parquet data 
files): The `encryption_key` is used directly as the AES-GCM key. The 
`aad_prefix` is combined with a 4-byte little-endian block index to form the 
AAD for each cipher block, as described in the [AES GCM Stream AAD 
section](gcm-stream-spec.md#additional-authenticated-data). The `file_length` 
field stores the encrypted file length for truncation detection.

Review Comment:
   Iceberg table encryption is not supported yet for ORC data format, so 
"manifest lists, manifests, and non-Parquet data files" should be something 
like "manifest lists, manifests, and Avro data files"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to