adamreeve commented on code in PR #8305:
URL: https://github.com/apache/arrow-rs/pull/8305#discussion_r2543811192


##########
parquet/src/file/metadata/writer.rs:
##########
@@ -758,13 +758,42 @@ impl MetadataObjectWriter {
         match column_chunk.column_crypto_metadata.as_deref() {
             None => {}
             Some(ColumnCryptoMetaData::ENCRYPTION_WITH_FOOTER_KEY) => {
+                use crate::file::metadata::thrift::serialize_column_meta_data;
                 // When uniform encryption is used the footer is already 
encrypted,
                 // so the column chunk does not need additional encryption.
+                // Except if we're in plaintext footer mode, then we need to 
encrypt
+                // the column metadata here.
+                let is_footer_encrypted = 
file_encryptor.properties().encrypt_footer();
+
+                if !is_footer_encrypted {
+                    // Temporarily clear crypto_metadata so statistics get 
included in encrypted blob
+                    let crypto_metadata = 
column_chunk.column_crypto_metadata.take();

Review Comment:
   This is nearly identical to the block below except for the different 
encryptors used. This should be able to be refactored to avoid this, eg. by 
changing the match to return an `Option<Box<dyn BlockEncryptor>>` and then have 
a separate match on that result that does the encryption.



##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -1197,13 +1197,30 @@ pub(super) fn serialize_column_meta_data<W: Write>(
     if let Some(dictionary_page_offset) = column_chunk.dictionary_page_offset {
         last_field_id = dictionary_page_offset.write_thrift_field(w, 11, 
last_field_id)?;
     }
-    // PageStatistics is the same as thrift Statistics, but writable
-    let stats = page_stats_to_thrift(column_chunk.statistics());
-    if let Some(stats) = stats {
-        last_field_id = stats.write_thrift_field(w, 12, last_field_id)?;
+
+    // Only write statistics to plaintext footer if column is not encrypted
+
+    #[cfg(feature = "encryption")]
+    if column_chunk.crypto_metadata().is_none() {

Review Comment:
   The contents of these two blocks is the same, so we could tidy this up by 
adding a new function to decide whether to write stats and remove the inline 
`#[cfg()]` here, eg:
   
   ```rust
   #[cfg(feature = "encryption")]
   fn should_write_column_stats(column_chunk: &ColumnChunkMetaData) -> bool {
       // If there is crypto metadata, the column is encrypted with a different 
key to the footer
       // or a plaintext footer is used, so the statistics are sensitive and 
shouldn't be written
       column_chunk.crypto_metadata().is_none()
   }
   
   #[cfg(not(feature = "encryption"))]
   fn should_write_column_stats(column_chunk: &ColumnChunkMetaData) -> bool {
       true
   }
   ```



##########
parquet/tests/encryption/encryption.rs:
##########
@@ -719,6 +719,132 @@ fn test_write_uniform_encryption_plaintext_footer() {
     );
 }
 
+#[test]
+pub fn test_non_uniform_plaintext_encryption_behaviour() {

Review Comment:
   I find this test a bit hard to follow, I'll see if I can suggest a better 
structure and make a PR to your branch.



##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -1459,12 +1476,11 @@ impl WriteThrift for ColumnChunkMetaData {
 
         #[cfg(feature = "encryption")]
         {
-            // only write the ColumnMetaData if we haven't already encrypted it
-            if self.encrypted_column_metadata.is_none() {
-                writer.write_field_begin(FieldType::Struct, 3, last_field_id)?;
-                serialize_column_meta_data(self, writer)?;
-                last_field_id = 3;
-            }
+            // Always write the ColumnMetaData struct

Review Comment:
   This now duplicates the block below so these can be merged and the 
`#[cfg...` removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to