This is an automated email from the ASF dual-hosted git repository.
gershinsky pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git
The following commit(s) were added to refs/heads/master by this push:
new 440de4d PARQUET-2491: Address AES GCM invocation limit in encryption
spec (#259)
440de4d is described below
commit 440de4de5271b1f7eb74e939540ca4932320f8f7
Author: ggershinsky <[email protected]>
AuthorDate: Thu Jun 13 13:23:45 2024 +0300
PARQUET-2491: Address AES GCM invocation limit in encryption spec (#259)
* address gcm invocation limit
* address review comments
* style
---
Encryption.md | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/Encryption.md b/Encryption.md
index a9c54c0..008519b 100644
--- a/Encryption.md
+++ b/Encryption.md
@@ -115,7 +115,6 @@ authentication layer called GMAC. For applications running
without AES accelerat
(e.g. on Java versions before Java 9) and willing to compromise on content
verification,
CTR cipher can provide a boost in encryption/decryption throughput.
-
#### 4.1.3 Nonces and IVs
GCM and CTR ciphers require a unique vector to be provided for each encrypted
stream.
In this document, the unique input to GCM encryption is called nonce (“number
used once”).
@@ -128,6 +127,28 @@ unique nonce with a length of 12 bytes (96 bits). Notice:
the NIST
specification uses a term “IV” for what is called “nonce” in the Parquet
encryption design.
+#### 4.1.4 Invocation limit
+According to the section 8.3 of the NIST SP 800-38D document, *"The total
number of invocations
+of the authenticated encryption function shall not exceed 2^32, including all
IV lengths and
+all instances of the authenticated encryption function with the given key"*.
This restriction is
+related to the "uniqueness requirement of IVs and keys" (section 8 in the NIST
spec) - *"if even
+one IV is ever repeated, then the implementation may be vulnerable"*.
*"Compliance with this
+requirement is crucial to the security of GCM"*.
+
+The bulk of modules in a Parquet file are page headers and data pages.
Therefore, one encryption
+key shall not not be used for more than 2^31 (~2 billion) pages. In Parquet
files encrypted with
+multiple keys (footer and column keys), the constraint on the number of
invocations is applied
+to each key separately.
+
+When running in the context of a larger system, any particular Parquet writer
implementation likely
+does not have sufficient context to enforce key invocation limits system-wide.
Therefore,
+the higher level system itself must arrange to supply keys appropriately to
the various writer instances.
+
+Parquet writer implementations should have a local invocation counter for each
encryption key. If the
+counter exceeds 2^32, the implementation should return an error and produce no
more cipherblocks.
+While this does not enforce a system-wide limit, it helps in distributed
systems that provide different
+keys to different nodes (or generate unique keys in each node).
+
### 4.2 Parquet encryption algorithms
#### 4.2.1 AES_GCM_V1