adamreeve commented on issue #8304:
URL: https://github.com/apache/arrow-rs/issues/8304#issuecomment-3331342778

   I've looked a bit into this and just want to document my findings.
   
   This behaviour is discussed in the format docs in [section 
5.3](https://github.com/apache/parquet-format/blob/c3f7be7dc615de76910154d690c534768ba25d6f/Encryption.md#53-protection-of-sensitive-metadata).
   
   * 
[`encrypted_column_metadata`](https://github.com/apache/parquet-format/blob/c3f7be7dc615de76910154d690c534768ba25d6f/src/main/thrift/parquet.thrift#L998)
 in the `ColumnChunk` is the encrypted version of 
[`meta_data`](https://github.com/apache/parquet-format/blob/c3f7be7dc615de76910154d690c534768ba25d6f/src/main/thrift/parquet.thrift#L980)
 and should be set when a plaintext footer is used and the column is encrypted, 
or when a column is encrypted with a different key to the footer.
   * When reading with decryption properties provided, the 
`encrypted_column_metadata` should be decrypted and used to replace the 
`meta_data`. It looks like this should already [happen 
here](https://github.com/apache/arrow-rs/blob/aa626e12de8bc0d0f56b5349239cae1be8d1a195/parquet/src/file/metadata/mod.rs#L638),
 although I'm not 100% sure we hit this code path with a plaintext footer.
   * Currently, when a `ColumnChunk` is created for writing the encrypted 
metadata is [always 
None](https://github.com/apache/arrow-rs/blob/aa626e12de8bc0d0f56b5349239cae1be8d1a195/parquet/src/file/metadata/mod.rs#L1249),
 so we need to fix both the case of a plaintext footer and when a column is 
encrypted with a different key to the footer
   * It looks like 
[here](https://github.com/apache/arrow/blob/cbd36b817fc77812f8df1a15bf24314de3b27f29/cpp/src/parquet/metadata.cc#L1727-L1756)
 is where the logic for writing encrypted column metadata is handled in C++ 
Parquet
        * When the footer is encrypted and a column is encrypted with another 
key, the whole `meta_data` field is removed. I guess this is because readers 
that support encryption are expected to handle when this is missing if they 
can't decrypt the column.
        * When a plaintext footer is used, the unencrypted `meta_data` field is 
kept, but its `statistics` and `encoding_stats` are removed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to