andersonm-ibm commented on a change in pull request #10450:
URL: https://github.com/apache/arrow/pull/10450#discussion_r736371258



##########
File path: docs/source/python/parquet.rst
##########
@@ -595,3 +595,172 @@ One example is Azure Blob storage, which can be 
interfaced through the
 
     abfs = AzureBlobFileSystem(account_name="XXXX", account_key="XXXX", 
container_name="XXXX")
     table = pq.read_table("file.parquet", filesystem=abfs)
+
+Parquet Modular Encryption (Columnar Encryption)
+------------------------------------------------
+
+Columnar encryption is supported for Parquet files in C++ starting from
+Apache Arrow 4.0.0 and in PyArrow starting from Apache Arrow 6.0.0.
+
+Parquet uses the envelope encryption practice, where file parts are encrypted
+with "data encryption keys" (DEKs), and the DEKs are encrypted with "master
+encryption keys" (MEKs). The DEKs are randomly generated by Parquet for each
+encrypted file/column. The MEKs are generated, stored and managed in a Key
+Management Service (KMS) of user’s choice.
+
+Reading and writing encrypted parquet files involves passing file encryption
+and decryption properties to :class:`~pyarrow.parquet.ParquetWriter` and to
+:class:`~.ParquetFile`, respectively.
+
+Writing an encrypted parquet:
+
+.. code-block:: python
+
+   encryption_properties = crypto_factory.file_encryption_properties(
+                                    kms_connection_config, encryption_config)
+   with pq.ParquetWriter(filename, schema,
+                        encryption_properties=encryption_properties) as writer:
+      writer.write_table(table)
+
+Reading an encrypted parquet:
+
+.. code-block:: python
+
+   decryption_properties = crypto_factory.file_decryption_properties(
+                                                    kms_connection_config)
+   parquet_file = pq.ParquetFile(filename,
+                                 decryption_properties=decryption_properties)
+
+
+In order to create the encryption and decryption properties, a 
``CryptoFactory``
+should be created and initialized with KMS Client details, as described below.
+
+
+KMS Client
+~~~~~~~~~~
+
+The master encryption keys must be kept and managed in a production-grade KMS
+system, deployed in user's organization. Using Parquet encryption requires
+implementation of a client class for the KMS server.
+Any KmsClient implementation should implement the following informal interface:
+
+.. code-block:: python
+
+   class KmsClient:
+      def wrap_key(self, key_bytes, master_key_identifier):
+         """Wrap a key - encrypt it with the master key."""
+            raise NotImplementedError()
+
+      def unwrap_key(self, wrapped_key, master_key_identifier):
+         """Unwrap a key - decrypt it with the master key."""
+         raise NotImplementedError()
+
+
+
+   class MyKmsClient(pq.KmsClient):
+      """An example KmsClient implementation skeleton"""
+      def __init__(self, kms_connection_configuration):
+         pq.KmsClient.__init__(self)
+         # Any KMS-specific initialization based on
+         # kms_connection_configuration comes here
+
+      def wrap_key(self, key_bytes, master_key_identifier):
+         wrapped_key = ... # call KMS to wrap key_bytes with key specified by
+                           # master_key_identifier
+         return wrapped_key
+
+      def unwrap_key(self, wrapped_key, master_key_identifier):
+         key_bytes = ... # call KMS to unwrap wrapped_key with key specified by
+                         # master_key_identifier
+         return key_bytes
+
+The concrete implementation will be loaded at runtime by a factory method
+provided by the user. This factory method will be used to initialize the
+``CryptoFactory`` for creating file encryption and decryption properties.
+For example, in order to use the ``MyKmsClient`` defined above:
+
+.. code-block:: python
+
+   def kms_client_factory(kms_connection_configuration):
+      return MyKmsClient(kms_connection_configuration)
+
+   crypto_factory = CryptoFactory(kms_client_factory)
+
+An :download:`example 
<../../../python/examples/parquet_encryption/sample_vault_kms_client.py>`
+of such a class for an open source
+`KMS <https://www.vaultproject.io/api/secret/transit>`_ can be found in the 
Apache
+Arrow GitHub repository. The production KMS client should be designed in
+cooperation with an organization's security administrators, and built by
+developers with experience in access control management. Once such a class is
+created, it can be passed to applications via a factory method and leveraged
+by general PyArrow users as shown in the encrypted parquet write/read sample
+above.
+
+KMS connection configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Configuration of connection to KMS (``kms_connection_config`` used when
+creating file encryption and decryption properties) includes the following
+options:
+
+* ``kms_instance_url``, URL of the KMS instance.
+* ``kms_instance_id``, ID of the KMS instance that will be used for encryption
+  (if multiple KMS instances are available).
+* ``key_access_token``, authorization token that will be passed to KMS.
+* ``custom_kms_conf``, a string dictionary with KMS-type-specific 
configuration.
+
+Encryption configuration
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Encryption configuration (``encryption_config`` used when creating file
+encryption properties) includes the following options:
+
+* ``footer_key``, ID of the master key for footer encryption/signing.
+* ``column_keys``, list of columns to encrypt, with master key IDs.
+* ``uniform_encryption``, encrypt footer and all columns with the same

Review comment:
       Thank you, @ggershinsky , for this reference. We opened a Jira for this 
- ARROW-14467, and we'll remove exposing the feature of uniform encryption from 
this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to