ggershinsky commented on PR #41821: URL: https://github.com/apache/arrow/pull/41821#issuecomment-2149011151
> Can we agree this proposal makes sense in principle? Assuming the collector can decrypt all files and encrypt them with a new key into a `_metadata` file of course. My 2c. If I understand the proposal correctly, it does not require all parquet files to be encrypted with the same data keys. Instead, the collector process can decrypt a footer (and column metadata) of any parquet file, regardless of their data keys, because for example the collector is authorized for the footer and column master keys. Technically, this is done by getting a decryption properties object from the relevant crypto factory. Then, the collector uses the same crypto factory to create a new encryption properties object (that has a footer and column data keys, as required) - and applies this object to all collected footers when writing them to the metadata file. Therefore, the future readers can (or cannot) read the encrypted footers and column metadata/stats according to the reader authorization (checked automatically when calling the crypto factory, as usual). If what I wrote is accurate, then the proposal sounds very good me. A couple of technical points. - On the reader side - if a reader is not authorized for a certain column, then the column_metadata/stats for this column should not be extracted from the metadata file. In other words, column metadata decryption should be reactive - done only if the column is projected in the reader query. - If the parquet file encryption is configured with the "external key material" mode - then we need to make sure this mode works ok for the metadata file writing/reading. Maybe this mode can be simply turned off for the metadata files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
