Hi all, We are working on the Parquet modular encryption, and are currently adding a high-level interface that allows to encrypt/decrypt parquet files via properties only (without calling the low level API). In the spark/parquet-mr domain, we're using the Hadoop configuration properties for that purpose - they are already passed from Spark to Parquet, and allow to add custom key-value properties that can carry the list of encrypted columns, key identities etc, as described in the https://docs.google.com/document/d/1boH6HPkG0ZhgxcaRkGk3QpZ8X_J91uXZwVGwYN45St4/edit?usp=sharing
I'm not sufficiently familiar with the pandas/pyarrow/parquet-cpp ecosystem. Is there an analog of Hadoop configuration (a free key-value map, passed all the way down to parquet-cpp)? Or a more structured configuration object (where we'll need to add the encryption-related properties)? All suggestions are welcome. Cheers, Gidon
