andersonm-ibm commented on a change in pull request #10450:
URL: https://github.com/apache/arrow/pull/10450#discussion_r770255581



##########
File path: docs/source/python/parquet.rst
##########
@@ -595,3 +595,172 @@ One example is Azure Blob storage, which can be 
interfaced through the
 
     abfs = AzureBlobFileSystem(account_name="XXXX", account_key="XXXX", 
container_name="XXXX")
     table = pq.read_table("file.parquet", filesystem=abfs)
+
+Parquet Modular Encryption (Columnar Encryption)
+------------------------------------------------
+
+Columnar encryption is supported for Parquet files in C++ starting from
+Apache Arrow 4.0.0 and in PyArrow starting from Apache Arrow 6.0.0.
+
+Parquet uses the envelope encryption practice, where file parts are encrypted
+with "data encryption keys" (DEKs), and the DEKs are encrypted with "master
+encryption keys" (MEKs). The DEKs are randomly generated by Parquet for each
+encrypted file/column. The MEKs are generated, stored and managed in a Key
+Management Service (KMS) of user’s choice.
+
+Reading and writing encrypted parquet files involves passing file encryption
+and decryption properties to :class:`~pyarrow.parquet.ParquetWriter` and to
+:class:`~.ParquetFile`, respectively.
+
+Writing an encrypted parquet:
+
+.. code-block:: python
+
+   encryption_properties = crypto_factory.file_encryption_properties(
+                                    kms_connection_config, encryption_config)
+   with pq.ParquetWriter(filename, schema,
+                        encryption_properties=encryption_properties) as writer:
+      writer.write_table(table)
+
+Reading an encrypted parquet:
+
+.. code-block:: python
+
+   decryption_properties = crypto_factory.file_decryption_properties(
+                                                    kms_connection_config)
+   parquet_file = pq.ParquetFile(filename,
+                                 decryption_properties=decryption_properties)
+
+
+In order to create the encryption and decryption properties, a 
``CryptoFactory``
+should be created and initialized with KMS Client details, as described below.
+
+
+KMS Client
+~~~~~~~~~~
+
+The master encryption keys must be kept and managed in a production-grade KMS
+system, deployed in user's organization. Using Parquet encryption requires
+implementation of a client class for the KMS server.
+Any KmsClient implementation should implement the following informal interface:
+
+.. code-block:: python
+
+   class KmsClient:
+      def wrap_key(self, key_bytes, master_key_identifier):
+         """Wrap a key - encrypt it with the master key."""
+            raise NotImplementedError()
+
+      def unwrap_key(self, wrapped_key, master_key_identifier):
+         """Unwrap a key - decrypt it with the master key."""
+         raise NotImplementedError()
+
+
+
+   class MyKmsClient(pq.KmsClient):
+      """An example KmsClient implementation skeleton"""
+      def __init__(self, kms_connection_configuration):
+         pq.KmsClient.__init__(self)
+         # Any KMS-specific initialization based on
+         # kms_connection_configuration comes here
+
+      def wrap_key(self, key_bytes, master_key_identifier):
+         wrapped_key = ... # call KMS to wrap key_bytes with key specified by
+                           # master_key_identifier
+         return wrapped_key
+
+      def unwrap_key(self, wrapped_key, master_key_identifier):
+         key_bytes = ... # call KMS to unwrap wrapped_key with key specified by
+                         # master_key_identifier
+         return key_bytes
+
+The concrete implementation will be loaded at runtime by a factory method
+provided by the user. This factory method will be used to initialize the
+``CryptoFactory`` for creating file encryption and decryption properties.
+For example, in order to use the ``MyKmsClient`` defined above:
+
+.. code-block:: python
+
+   def kms_client_factory(kms_connection_configuration):
+      return MyKmsClient(kms_connection_configuration)
+
+   crypto_factory = CryptoFactory(kms_client_factory)
+
+An :download:`example 
<../../../python/examples/parquet_encryption/sample_vault_kms_client.py>`
+of such a class for an open source
+`KMS <https://www.vaultproject.io/api/secret/transit>`_ can be found in the 
Apache
+Arrow GitHub repository. The production KMS client should be designed in
+cooperation with an organization's security administrators, and built by
+developers with experience in access control management. Once such a class is
+created, it can be passed to applications via a factory method and leveraged
+by general PyArrow users as shown in the encrypted parquet write/read sample
+above.
+
+KMS connection configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Configuration of connection to KMS (``kms_connection_config`` used when
+creating file encryption and decryption properties) includes the following
+options:
+
+* ``kms_instance_url``, URL of the KMS instance.
+* ``kms_instance_id``, ID of the KMS instance that will be used for encryption
+  (if multiple KMS instances are available).
+* ``key_access_token``, authorization token that will be passed to KMS.
+* ``custom_kms_conf``, a string dictionary with KMS-type-specific 
configuration.
+
+Encryption configuration
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Encryption configuration (``encryption_config`` used when creating file
+encryption properties) includes the following options:
+
+* ``footer_key``, ID of the master key for footer encryption/signing.
+* ``column_keys``, list of columns to encrypt, with master key IDs.
+* ``uniform_encryption``, encrypt footer and all columns with the same
+  encryption key.
+* ``encryption_algorithm``, parquet encryption algorithm.
+  Can be ``AES_GCM_V1`` (default), or ``AES_GCM_CTR_V1``.
+* ``plaintext_footer``, write files with plaintext footer.
+* ``double_wrapping``, use double wrapping - where data encryption keys (DEKs)
+  are encrypted with key encryption keys (KEKs), which in turn are encrypted
+  with master keys. If set to false, use single wrapping - where DEKs are
+  encrypted directly with master keys.
+* ``cache_lifetime``, lifetime of cached entities (key encryption keys,
+  local wrapping keys, KMS client objects)
+* ``internal_key_material``, store key material inside Parquet file footers;
+  this mode doesn’t produce additional files. If set to false, key material is
+  stored in separate files in the same folder, which enables key rotation for
+  immutable Parquet files.
+* ``data_key_length_bits``, length of data encryption keys (DEKs), randomly
+  generated by parquet key management tools. Can be 128, 192 or 256 bits.
+
+.. note::
+   By default, Parquet implements a "double envelope encryption" mode, that
+   minimizes the interaction of the program with a KMS server. In this mode,
+   the DEKs are encrypted with "key encryption keys" (KEKs, randomly generated
+   by Parquet). The KEKs are encrypted with MEKs in KMS; the result and the
+   KEK itself are cached in the process memory. Users interested in regular
+   envelope encryption, can switch to it by setting the double_wrapping
+   parameter of EncryptionConfiguration to false.
+
+An example encryption configuration:
+
+.. code-block:: python
+
+   encryption_config = pq.EncryptionConfiguration(
+      footer_key="footer_key_name",
+      column_keys={
+         "column_key_name": ["Column1", "Column2"],
+      },
+   )
+
+Decryption configuration
+~~~~~~~~~~~~~~~~~~~~~~~~
+   
+Decryption configuration (``decryption_config`` used when creating file
+decryption properties) is optional and it includes the following options:
+
+* ``cache_lifetime``, lifetime of cached entities (key encryption keys, local
+  wrapping keys, KMS client objects).

Review comment:
       It's a datetime.timedelta. Adding it to the description.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to