tcrasset opened a new issue, #38914:
URL: https://github.com/apache/arrow/issues/38914
### Describe the enhancement requested
In the C++ library, there is the notion of `uniform_encryption` in
`EncryptionConfiguration`, which allows to encrypt all the column and the
footer with the same encryption key. The other way is providing a list of
`column_keys` with their respective encryption key.
From `parquet/encryption/crypto_factory.h`:
```c++
struct PARQUET_EXPORT EncryptionConfiguration {
explicit EncryptionConfiguration(const std::string& footer_key)
: footer_key(footer_key) {}
/// ID of the master key for footer encryption/signing
std::string footer_key;
/// List of columns to encrypt, with master key IDs (see HIVE-21848).
/// Format: "masterKeyID:colName,colName;masterKeyID:colName..."
/// Either
/// (1) column_keys must be set
/// or
/// (2) uniform_encryption must be set to true
/// If none of (1) and (2) are true, or if both are true, an exception
will be
/// thrown.
std::string column_keys;
/// Encrypt footer and all columns with the same encryption key.
bool uniform_encryption = kDefaultUniformEncryption;
...
}
```
I'm using the Python wrapper around the C++ library, where
`uniform_encryption` is not yet present.
From `python/pyarrow/_parquet_encryption.pyx`:
```python
cdef class EncryptionConfiguration(_Weakrefable):
"""Configuration of the encryption, such as which columns to encrypt"""
# Avoid mistakingly creating attributes
__slots__ = ()
def __init__(self, footer_key, *, column_keys=None,
encryption_algorithm=None,
plaintext_footer=None, double_wrapping=None,
cache_lifetime=None, internal_key_material=None,
data_key_length_bits=None):
self.configuration.reset(
new CEncryptionConfiguration(tobytes(footer_key)))
if column_keys is not None:
self.column_keys = column_keys
if encryption_algorithm is not None:
self.encryption_algorithm = encryption_algorithm
if plaintext_footer is not None:
self.plaintext_footer = plaintext_footer
if double_wrapping is not None:
self.double_wrapping = double_wrapping
if cache_lifetime is not None:
self.cache_lifetime = cache_lifetime
if internal_key_material is not None:
self.internal_key_material = internal_key_material
if data_key_length_bits is not None:
self.data_key_length_bits = data_key_length_bits
```
The use-case I'm in requires to encrypt all the columns, however I don't
have the names of the columns, as I'm streaming the file from an external
source, chunk by chunk.
Would it be possible to add it to the python implementation?
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]