tcrasset opened a new issue, #38914:
URL: https://github.com/apache/arrow/issues/38914

   ### Describe the enhancement requested
   
   In the C++ library, there is the notion of `uniform_encryption` in 
`EncryptionConfiguration`, which allows to encrypt all the column and the 
footer with the same encryption key. The other way is providing a list of 
`column_keys` with their respective encryption key.
   
   
   From `parquet/encryption/crypto_factory.h`:
   
   ```c++
   struct PARQUET_EXPORT EncryptionConfiguration {
     explicit EncryptionConfiguration(const std::string& footer_key)
         : footer_key(footer_key) {}
   
     /// ID of the master key for footer encryption/signing
     std::string footer_key;
   
     /// List of columns to encrypt, with master key IDs (see HIVE-21848).
     /// Format: "masterKeyID:colName,colName;masterKeyID:colName..."
     /// Either
     /// (1) column_keys must be set
     /// or
     /// (2) uniform_encryption must be set to true
     /// If none of (1) and (2) are true, or if both are true, an exception 
will be
     /// thrown.
     std::string column_keys;
   
     /// Encrypt footer and all columns with the same encryption key.
     bool uniform_encryption = kDefaultUniformEncryption;
     ...
   }
   ```
   
   I'm using the Python wrapper around the C++ library, where 
`uniform_encryption` is not yet present.
   
   From `python/pyarrow/_parquet_encryption.pyx`:
   ```python
   
   cdef class EncryptionConfiguration(_Weakrefable):
       """Configuration of the encryption, such as which columns to encrypt"""
       # Avoid mistakingly creating attributes
       __slots__ = ()
   
       def __init__(self, footer_key, *, column_keys=None,
                    encryption_algorithm=None,
                    plaintext_footer=None, double_wrapping=None,
                    cache_lifetime=None, internal_key_material=None,
                    data_key_length_bits=None):
           self.configuration.reset(
               new CEncryptionConfiguration(tobytes(footer_key)))
           if column_keys is not None:
               self.column_keys = column_keys
           if encryption_algorithm is not None:
               self.encryption_algorithm = encryption_algorithm
           if plaintext_footer is not None:
               self.plaintext_footer = plaintext_footer
           if double_wrapping is not None:
               self.double_wrapping = double_wrapping
           if cache_lifetime is not None:
               self.cache_lifetime = cache_lifetime
           if internal_key_material is not None:
               self.internal_key_material = internal_key_material
           if data_key_length_bits is not None:
               self.data_key_length_bits = data_key_length_bits
   
   ```
   
   The use-case I'm in requires to encrypt all the columns, however I don't 
have the names of the columns, as I'm streaming the file from an external 
source, chunk by chunk.
   
   Would it be possible to add it to the python implementation?
   
   ### Component(s)
   
   Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to