This is an automated email from the ASF dual-hosted git repository.

gershinsky pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 440de4d  PARQUET-2491: Address AES GCM invocation limit in encryption 
spec (#259)
440de4d is described below

commit 440de4de5271b1f7eb74e939540ca4932320f8f7
Author: ggershinsky <[email protected]>
AuthorDate: Thu Jun 13 13:23:45 2024 +0300

    PARQUET-2491: Address AES GCM invocation limit in encryption spec (#259)
    
    * address gcm invocation limit
    
    * address review comments
    
    * style
---
 Encryption.md | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/Encryption.md b/Encryption.md
index a9c54c0..008519b 100644
--- a/Encryption.md
+++ b/Encryption.md
@@ -115,7 +115,6 @@ authentication layer called GMAC. For applications running 
without AES accelerat
 (e.g. on Java versions before Java 9) and willing to compromise on content 
verification, 
 CTR cipher can provide a boost in encryption/decryption throughput.
 
-
 #### 4.1.3 Nonces and IVs
 GCM and CTR ciphers require a unique vector to be provided for each encrypted 
stream. 
 In this document, the unique input to GCM encryption is called nonce (“number 
used once”).
@@ -128,6 +127,28 @@ unique nonce with a length of 12 bytes (96 bits). Notice: 
the NIST
 specification uses a term “IV” for what is called “nonce” in the Parquet 
encryption design.
 
 
+#### 4.1.4 Invocation limit
+According to the section 8.3 of the NIST SP 800-38D document, *"The total 
number of invocations 
+of the authenticated encryption function shall not exceed 2^32, including all 
IV lengths and 
+all instances of the authenticated encryption function with the given key"*. 
This restriction is
+related to the "uniqueness requirement of IVs and keys" (section 8 in the NIST 
spec) - *"if even 
+one IV is ever repeated, then the implementation may be vulnerable"*. 
*"Compliance with this 
+requirement is crucial to the security of GCM"*.
+
+The bulk of modules in a Parquet file are page headers and data pages. 
Therefore, one encryption 
+key shall not not be used for more than 2^31 (~2 billion) pages. In Parquet 
files encrypted with 
+multiple keys (footer and column keys), the constraint on the number of 
invocations is applied 
+to each key separately.
+
+When running in the context of a larger system, any particular Parquet writer 
implementation likely
+does not have sufficient context to enforce key invocation limits system-wide. 
Therefore,
+the higher level system itself must arrange to supply keys appropriately to 
the various writer instances.
+
+Parquet writer implementations should have a local invocation counter for each 
encryption key. If the 
+counter exceeds 2^32, the implementation should return an error and produce no 
more cipherblocks. 
+While this does not enforce a system-wide limit, it helps in distributed 
systems that provide different 
+keys to different nodes (or generate unique keys in each node).
+
 ### 4.2 Parquet encryption algorithms
 
 #### 4.2.1 AES_GCM_V1

Reply via email to