This is an automated email from the ASF dual-hosted git repository.
gershinsky pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push:
new e74785d Added AES-256 encrypted parquet generated from parquet-java
(#102)
e74785d is described below
commit e74785d85a4ecee829e1e405444d6a1b24b8bc9c
Author: hsiang-c <[email protected]>
AuthorDate: Tue Mar 3 22:49:07 2026 -0800
Added AES-256 encrypted parquet generated from parquet-java (#102)
* Added AES-256 encrypted parquet from modified parquet-java
* Match schema and values
* Match INT96 values
* Note on AES256 test data
* Binary in ByteOrder.LITTLE_ENDIAN
* Fix byte order
* Updated notes
---
data/README.md | 54 +++++++++++++++++++++
.../encrypt_columns_and_footer.parquet.encrypted | Bin 0 -> 9858 bytes
...ncrypt_columns_and_footer_ctr.parquet.encrypted | Bin 0 -> 9714 bytes
...nd_footer_disable_aad_storage.parquet.encrypted | Bin 0 -> 9858 bytes
...rypt_columns_plaintext_footer.parquet.encrypted | Bin 0 -> 8669 bytes
data/aes256/uniform_encryption.parquet.encrypted | Bin 0 -> 8258 bytes
6 files changed, 54 insertions(+)
diff --git a/data/README.md b/data/README.md
index 459dd14..eed8b46 100644
--- a/data/README.md
+++ b/data/README.md
@@ -106,6 +106,60 @@ which is compatible with the `TestOnlyInServerWrapKms` KMS
client used in C++ te
The `encrypt_columns_and_footer_bloom_filter.parquet.encrypted` file enables
Bloom filters
on `double_field` and `float_field`.
+The files in `data/aes256` were encrypted with the following keys and key ids
(when using key\_retriever) using parquet-mr:
+* Encrypted/Signed Footer:
+ * key: {0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1}
+ * key_id: "kf"
+* Encrypted column named double_field (including column and offset index):
+ * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2}
+ * key_id: "kc1"
+* Encrypted column named float_field (including column and offset index):
+ * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,3}
+ * key_id: "kc2"
+* Encrypted column named boolean_field (including column and offset index):
+ * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,4}
+ * key_id: "kc3"
+* Encrypted column named int32_field (including column and offset index):
+ * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,5}
+ * key_id: "kc4"
+* Encrypted column named ba_field (including column and offset index):
+ * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,6}
+ * key_id: "kc5"
+* Encrypted column named flba_field (including column and offset index):
+ * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,7}
+ * key_id: "kc6"
+* Encrypted column named int64_field (including column and offset index):
+ * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,8}
+ * key_id: "kc7"
+* Encrypted column named int96_field (including column and offset index):
+ * key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,9}
+ * key_id: "kc8"
+
+The corresponding schema in Java is:
+
+```java
+// byte order is LITTLE_ENDIAN and PageWrite checksum is disabled.
+public static final String BOOLEAN_FIELD_NAME = "boolean_field";
+public static final String INT32_FIELD_NAME = "int32_field";
+public static final String INT64_FIELD_NAME = "int64_field";
+public static final String INT96_FIELD_NAME = "int96_field";
+public static final String FLOAT_FIELD_NAME = "float_field";
+public static final String DOUBLE_FIELD_NAME = "double_field";
+public static final String BINARY_FIELD_NAME = "ba_field";
+public static final String FIXED_LENGTH_BINARY_FIELD_NAME = "flba_field";
+
+private static final MessageType SCHEMA = new MessageType(
+ "schema",
+ new PrimitiveType(REQUIRED, BOOLEAN, BOOLEAN_FIELD_NAME),
+ Types.required(INT32).as(LogicalTypeAnnotation.timeType(true,
MILLIS)).named(INT32_FIELD_NAME),
+ new PrimitiveType(REPEATED, INT64, INT64_FIELD_NAME),
+ Types.required(INT96).named(INT96_FIELD_NAME),
+ new PrimitiveType(REQUIRED, FLOAT, FLOAT_FIELD_NAME),
+ new PrimitiveType(REQUIRED, DOUBLE, DOUBLE_FIELD_NAME),
+ new PrimitiveType(OPTIONAL, BINARY, BINARY_FIELD_NAME),
+
Types.required(FIXED_LEN_BYTE_ARRAY).length(FIXED_LENGTH).named(FIXED_LENGTH_BINARY_FIELD_NAME));
+```
+
## Checksum Files
The schema for the `datapage_v1-*-checksum.parquet` test files is:
diff --git a/data/aes256/encrypt_columns_and_footer.parquet.encrypted
b/data/aes256/encrypt_columns_and_footer.parquet.encrypted
new file mode 100644
index 0000000..0d0935f
Binary files /dev/null and
b/data/aes256/encrypt_columns_and_footer.parquet.encrypted differ
diff --git a/data/aes256/encrypt_columns_and_footer_ctr.parquet.encrypted
b/data/aes256/encrypt_columns_and_footer_ctr.parquet.encrypted
new file mode 100644
index 0000000..65b8861
Binary files /dev/null and
b/data/aes256/encrypt_columns_and_footer_ctr.parquet.encrypted differ
diff --git
a/data/aes256/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted
b/data/aes256/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted
new file mode 100644
index 0000000..e4a87c5
Binary files /dev/null and
b/data/aes256/encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted
differ
diff --git a/data/aes256/encrypt_columns_plaintext_footer.parquet.encrypted
b/data/aes256/encrypt_columns_plaintext_footer.parquet.encrypted
new file mode 100644
index 0000000..7759ba9
Binary files /dev/null and
b/data/aes256/encrypt_columns_plaintext_footer.parquet.encrypted differ
diff --git a/data/aes256/uniform_encryption.parquet.encrypted
b/data/aes256/uniform_encryption.parquet.encrypted
new file mode 100644
index 0000000..6f14482
Binary files /dev/null and b/data/aes256/uniform_encryption.parquet.encrypted
differ