This is an automated email from the ASF dual-hosted git repository.

gershinsky pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git


The following commit(s) were added to refs/heads/master by this push:
     new e74785d  Added AES-256 encrypted parquet generated from parquet-java 
(#102)
e74785d is described below

commit e74785d85a4ecee829e1e405444d6a1b24b8bc9c
Author: hsiang-c <[email protected]>
AuthorDate: Tue Mar 3 22:49:07 2026 -0800

    Added AES-256 encrypted parquet generated from parquet-java (#102)
    
    * Added AES-256 encrypted parquet from modified parquet-java
    
    * Match schema and values
    
    * Match INT96 values
    
    * Note on AES256 test data
    
    * Binary in ByteOrder.LITTLE_ENDIAN
    
    * Fix byte order
    
    * Updated notes
---
 data/README.md                                     |  54 +++++++++++++++++++++
 .../encrypt_columns_and_footer.parquet.encrypted   | Bin 0 -> 9858 bytes
 ...ncrypt_columns_and_footer_ctr.parquet.encrypted | Bin 0 -> 9714 bytes
 ...nd_footer_disable_aad_storage.parquet.encrypted | Bin 0 -> 9858 bytes
 ...rypt_columns_plaintext_footer.parquet.encrypted | Bin 0 -> 8669 bytes
 data/aes256/uniform_encryption.parquet.encrypted   | Bin 0 -> 8258 bytes
 6 files changed, 54 insertions(+)

diff --git a/data/README.md b/data/README.md
index 459dd14..eed8b46 100644
--- a/data/README.md
+++ b/data/README.md
@@ -106,6 +106,60 @@ which is compatible with the `TestOnlyInServerWrapKms` KMS 
client used in C++ te
 The `encrypt_columns_and_footer_bloom_filter.parquet.encrypted` file enables 
Bloom filters
 on `double_field` and `float_field`.
 
+The files in `data/aes256` were encrypted with the following keys and key ids 
(when using key\_retriever) using parquet-mr:
+* Encrypted/Signed Footer:
+  * key:   {0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1}
+  * key_id: "kf"
+* Encrypted column named double_field (including column and offset index):
+  * key:  {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2}
+  * key_id: "kc1"
+* Encrypted column named float_field (including column and offset index):
+  * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,3}
+  * key_id: "kc2"
+* Encrypted column named boolean_field (including column and offset index):
+  * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,4}
+  * key_id: "kc3"
+* Encrypted column named int32_field (including column and offset index):
+  * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,5}
+  * key_id: "kc4"
+* Encrypted column named ba_field (including column and offset index):
+  * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,6}
+  * key_id: "kc5"
+* Encrypted column named flba_field (including column and offset index):
+  * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,7}
+  * key_id: "kc6"
+* Encrypted column named int64_field (including column and offset index):
+  * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,8}
+  * key_id: "kc7"
+* Encrypted column named int96_field (including column and offset index):
+  * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,9}
+  * key_id: "kc8"
+
+The corresponding schema in Java is:
+
+```java
+// byte order is LITTLE_ENDIAN and PageWrite checksum is disabled.
+public static final String BOOLEAN_FIELD_NAME = "boolean_field";
+public static final String INT32_FIELD_NAME = "int32_field";
+public static final String INT64_FIELD_NAME = "int64_field";
+public static final String INT96_FIELD_NAME = "int96_field";
+public static final String FLOAT_FIELD_NAME = "float_field";
+public static final String DOUBLE_FIELD_NAME = "double_field";
+public static final String BINARY_FIELD_NAME = "ba_field";
+public static final String FIXED_LENGTH_BINARY_FIELD_NAME = "flba_field";
+
+private static final MessageType SCHEMA = new MessageType(
+    "schema",
+    new PrimitiveType(REQUIRED, BOOLEAN, BOOLEAN_FIELD_NAME),
+    Types.required(INT32).as(LogicalTypeAnnotation.timeType(true, 
MILLIS)).named(INT32_FIELD_NAME),
+    new PrimitiveType(REPEATED, INT64, INT64_FIELD_NAME),
+    Types.required(INT96).named(INT96_FIELD_NAME),
+    new PrimitiveType(REQUIRED, FLOAT, FLOAT_FIELD_NAME),
+    new PrimitiveType(REQUIRED, DOUBLE, DOUBLE_FIELD_NAME),
+    new PrimitiveType(OPTIONAL, BINARY, BINARY_FIELD_NAME),
+    
Types.required(FIXED_LEN_BYTE_ARRAY).length(FIXED_LENGTH).named(FIXED_LENGTH_BINARY_FIELD_NAME));
+```
+
 ## Checksum Files
 
 The schema for the `datapage_v1-*-checksum.parquet` test files is:
diff --git a/data/aes256/encrypt_columns_and_footer.parquet.encrypted 
b/data/aes256/encrypt_columns_and_footer.parquet.encrypted
new file mode 100644
index 0000000..0d0935f
Binary files /dev/null and 
b/data/aes256/encrypt_columns_and_footer.parquet.encrypted differ
diff --git a/data/aes256/encrypt_columns_and_footer_ctr.parquet.encrypted 
b/data/aes256/encrypt_columns_and_footer_ctr.parquet.encrypted
new file mode 100644
index 0000000..65b8861
Binary files /dev/null and 
b/data/aes256/encrypt_columns_and_footer_ctr.parquet.encrypted differ
diff --git 
a/data/aes256/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted 
b/data/aes256/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted
new file mode 100644
index 0000000..e4a87c5
Binary files /dev/null and 
b/data/aes256/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted 
differ
diff --git a/data/aes256/encrypt_columns_plaintext_footer.parquet.encrypted 
b/data/aes256/encrypt_columns_plaintext_footer.parquet.encrypted
new file mode 100644
index 0000000..7759ba9
Binary files /dev/null and 
b/data/aes256/encrypt_columns_plaintext_footer.parquet.encrypted differ
diff --git a/data/aes256/uniform_encryption.parquet.encrypted 
b/data/aes256/uniform_encryption.parquet.encrypted
new file mode 100644
index 0000000..6f14482
Binary files /dev/null and b/data/aes256/uniform_encryption.parquet.encrypted 
differ

Reply via email to