jorisvandenbossche commented on code in PR #34616:
URL: https://github.com/apache/arrow/pull/34616#discussion_r1294380534


##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -52,12 +52,81 @@ from pyarrow._parquet cimport (
     FileMetaData,
 )
 
+from pyarrow._parquet_encryption cimport *

Review Comment:
   This is the import that will fail at runtime.
   
   The way we solved this before for the plain Parquet (non-dataset) support is 
splitting it in a separate file (`_parquet_encryption.pyx` in addition to 
`_parquet.pyx`), and expose it in a submodule (`pyarrow.parquet.encryption`). 
   However, I assume that approach will not work here, given that it's the 
ParquetFragmentScanOptions et al classes really need to know about it (unwrap 
it to created the proper C++ counterpart of the options class).
   
   We might need some compile-time switches here, similarly as is done in the 
C++ code?



##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -729,6 +825,14 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions):
             other.thrift_container_size_limit)
         return attrs == other_attrs
 
+    def SetParquetDecryptionConfig(self, ParquetDecryptionConfig config):
+        cdef shared_ptr[CParquetDecryptionConfig] c_config
+        if not isinstance(config, ParquetDecryptionConfig):
+            raise ValueError("config must be a ParquetDecryptionConfig")
+        self._parquet_decryption_config = config
+        c_config = config.unwrap()
+        self.parquet_options.parquet_decryption_config = c_config

Review Comment:
   Is there a reason to not included this in the 
`@parquet_decryption_config.setter`? So that property setter can just contain 
this code, instead of calling `SetParquetDecryptionConfig`.



##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -52,12 +52,81 @@ from pyarrow._parquet cimport (
     FileMetaData,
 )
 
+from pyarrow._parquet_encryption cimport *
+
 
 cdef Expression _true = Expression._scalar(True)
 
 
 ctypedef CParquetFileWriter* _CParquetFileWriterPtr
 
+cdef class ParquetEncryptionConfig(_Weakrefable):

Review Comment:
   Question on naming: what is called here "config", is that more or less what 
maps to the "properties" in the single-file case? 
   The keyword in `pq.read_table` is called `decryption_properties` but here 
the keyword is `decryption_config` (below in the ParquetFragmentScanOptions). 
Could those two names be consolidated, or is that difference actually 
intentionally?



##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -52,12 +52,81 @@ from pyarrow._parquet cimport (
     FileMetaData,
 )
 
+from pyarrow._parquet_encryption cimport *
+
 
 cdef Expression _true = Expression._scalar(True)
 
 
 ctypedef CParquetFileWriter* _CParquetFileWriterPtr
 
+cdef class ParquetEncryptionConfig(_Weakrefable):
+    cdef:
+        shared_ptr[CParquetEncryptionConfig] c_config
+
+    # Avoid mistakenly creating attributes
+    __slots__ = ()
+
+    def __cinit__(self, object crypto_factory, object kms_connection_config,
+                  object encryption_config):
+
+        cdef shared_ptr[CEncryptionConfiguration] c_encryption_config
+
+        if encryption_config is None:
+            raise ValueError(
+                "encryption_config cannot be None")
+
+        self.c_config.reset(new CParquetEncryptionConfig())
+
+        c_encryption_config = 
pyarrow_unwrap_encryptionconfig(encryption_config)
+
+        self.c_config.get().Setup(pyarrow_unwrap_cryptofactory(crypto_factory),
+                                  pyarrow_unwrap_kmsconnectionconfig(
+                                      kms_connection_config),
+                                  c_encryption_config)
+
+    @staticmethod
+    cdef wrap(shared_ptr[CParquetEncryptionConfig] c_config):
+        cdef ParquetEncryptionConfig python_config = 
ParquetEncryptionConfig.__new__(ParquetEncryptionConfig)
+        python_config.c_config = c_config
+        return python_config
+
+    cdef shared_ptr[CParquetEncryptionConfig] unwrap(self):
+        return self.c_config
+
+cdef class ParquetDecryptionConfig(_Weakrefable):

Review Comment:
   ```suggestion
   
   
   cdef class ParquetDecryptionConfig(_Weakrefable):
   ```
   
   (small formatting nit: two blank lines between class definitions; same for 
other cases further in this file)



##########
python/pyarrow/dataset_api.pxi:
##########
@@ -0,0 +1,59 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from libcpp.memory cimport shared_ptr
+
+
+cdef api bint pyarrow_is_cryptofactory(object crypto_factory):
+    return isinstance(crypto_factory, CryptoFactory)
+
+cdef api shared_ptr[CCryptoFactory] pyarrow_unwrap_cryptofactory(object 
crypto_factory):

Review Comment:
   Can you expand a bit on why those functions were put in this separate file? 
(and if it is needed, it would be good to add some context for it as a comment 
in the file)



##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -52,12 +52,81 @@ from pyarrow._parquet cimport (
     FileMetaData,
 )
 
+from pyarrow._parquet_encryption cimport *
+
 
 cdef Expression _true = Expression._scalar(True)
 
 
 ctypedef CParquetFileWriter* _CParquetFileWriterPtr
 
+cdef class ParquetEncryptionConfig(_Weakrefable):

Review Comment:
   Can you add a docstring to this class? (and same for ParquetDecryptionConfig)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to