adamreeve commented on issue #8304: URL: https://github.com/apache/arrow-rs/issues/8304#issuecomment-3331342778
I've looked a bit into this and just want to document my findings. This behaviour is discussed in the format docs in [section 5.3](https://github.com/apache/parquet-format/blob/c3f7be7dc615de76910154d690c534768ba25d6f/Encryption.md#53-protection-of-sensitive-metadata). * [`encrypted_column_metadata`](https://github.com/apache/parquet-format/blob/c3f7be7dc615de76910154d690c534768ba25d6f/src/main/thrift/parquet.thrift#L998) in the `ColumnChunk` is the encrypted version of [`meta_data`](https://github.com/apache/parquet-format/blob/c3f7be7dc615de76910154d690c534768ba25d6f/src/main/thrift/parquet.thrift#L980) and should be set when a plaintext footer is used and the column is encrypted, or when a column is encrypted with a different key to the footer. * When reading with decryption properties provided, the `encrypted_column_metadata` should be decrypted and used to replace the `meta_data`. It looks like this should already [happen here](https://github.com/apache/arrow-rs/blob/aa626e12de8bc0d0f56b5349239cae1be8d1a195/parquet/src/file/metadata/mod.rs#L638), although I'm not 100% sure we hit this code path with a plaintext footer. * Currently, when a `ColumnChunk` is created for writing the encrypted metadata is [always None](https://github.com/apache/arrow-rs/blob/aa626e12de8bc0d0f56b5349239cae1be8d1a195/parquet/src/file/metadata/mod.rs#L1249), so we need to fix both the case of a plaintext footer and when a column is encrypted with a different key to the footer * It looks like [here](https://github.com/apache/arrow/blob/cbd36b817fc77812f8df1a15bf24314de3b27f29/cpp/src/parquet/metadata.cc#L1727-L1756) is where the logic for writing encrypted column metadata is handled in C++ Parquet * When the footer is encrypted and a column is encrypted with another key, the whole `meta_data` field is removed. I guess this is because readers that support encryption are expected to handle when this is missing if they can't decrypt the column. * When a plaintext footer is used, the unencrypted `meta_data` field is kept, but its `statistics` and `encoding_stats` are removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org