adamreeve commented on code in PR #8305:
URL: https://github.com/apache/arrow-rs/pull/8305#discussion_r2586593636


##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -1236,13 +1210,16 @@ pub(super) fn serialize_column_meta_data<W: Write>(
     if let Some(dictionary_page_offset) = column_chunk.dictionary_page_offset {
         last_field_id = dictionary_page_offset.write_thrift_field(w, 11, 
last_field_id)?;
     }
-    // PageStatistics is the same as thrift Statistics, but writable
-    let stats = page_stats_to_thrift(column_chunk.statistics());
-    if let Some(stats) = stats {
-        last_field_id = stats.write_thrift_field(w, 12, last_field_id)?;
-    }
-    if let Some(page_encoding_stats) = column_chunk.page_encoding_stats() {
-        last_field_id = page_encoding_stats.write_thrift_field(w, 13, 
last_field_id)?;
+
+    if should_write_column_stats(column_chunk) {

Review Comment:
   Good point, I agree all the below fields should be skipped too.
   
   I checked what the C++ implementation does and it's actually a bit different 
between the plaintext footer case and when the footer is encrypted but the 
column is encrypted with a different key: 
https://github.com/apache/arrow/blob/cbd36b817fc77812f8df1a15bf24314de3b27f29/cpp/src/parquet/metadata.cc#L1748-L1755.
   
   When there's a plaintext footer, only the `statistics` and `encoding_stats` 
are stripped out of the unencrypted metadata, similar to what we're proposing 
here. But when the footer is also encrypted, it's assumed that the reader can 
handle when the whole `metadata` field isn't set so this is completely skipped, 
which is what was done before.
   
   It might make sense to match what C++ does and keep the previous behaviour 
of excluding the full `ColumnChunk` `metadata` field when the footer is 
encrypted, and check that the reader can handle this when it doesn't have the 
column key. And then in the plaintext footer case, also exclude the other 
fields below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to