pitrou commented on code in PR #39438:
URL: https://github.com/apache/arrow/pull/39438#discussion_r1504331736
##########
python/pyarrow/parquet/core.py:
##########
@@ -1327,17 +1327,15 @@ def __init__(self, path_or_paths, filesystem=None,
schema=None, *, filters=None,
except ValueError:
filesystem = LocalFileSystem(use_mmap=memory_map)
finfo = filesystem.get_file_info(path_or_paths)
- if finfo.is_file:
- single_file = path_or_paths
if finfo.type == FileType.Directory:
self._base_dir = path_or_paths
else:
- single_file = path_or_paths
+ buffer_reader = path_or_paths
parquet_format = ds.ParquetFileFormat(**read_options)
- if single_file is not None:
- fragment = parquet_format.make_fragment(single_file, filesystem)
+ if buffer_reader is not None:
+ fragment = parquet_format.make_fragment(buffer_reader, filesystem)
Review Comment:
Can you please remove these unrelated changes?
##########
python/pyarrow/tests/parquet/test_encryption.py:
##########
@@ -530,3 +530,45 @@ def kms_factory(kms_connection_configuration):
path, decryption_properties=file_decryption_properties)
result_table = result.read(use_threads=True)
assert data_table.equals(result_table)
+
+
+def test_encrypted_parquet_read_table(tempdir, data_table,
basic_encryption_config):
+ """Write an encrypted parquet then read it back using read_table."""
+ path = tempdir / 'encrypted_table_read_table.parquet'
Review Comment:
This is copy-pasting almost all the same code from other test functions, can
you please factor out the common steps?
##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -149,6 +155,7 @@ cdef class ParquetFileFormat(FileFormat):
self.init(<shared_ptr[CFileFormat]> wrapped)
self.default_fragment_scan_options = default_fragment_scan_options
+ self._set_default_fragment_scan_options(default_fragment_scan_options)
Review Comment:
Why is it necessary to add this here?
##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -808,6 +815,18 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions):
raise ValueError("size must be larger than zero")
self.reader_properties().set_thrift_container_size_limit(size)
+ def set_file_decryption_properties(self, FileDecryptionProperties
decryption_properties):
Review Comment:
Several things:
1) will this work if Parquet encryption is not enabled at compile time? (see
how `parquet_decryption_config` works)
2) can you make this a read/write property like the other options here?
3) can it also be added to the constructor like the other options?
4) can you add unit tests for it? see existing test in
https://github.com/apache/arrow/blob/40a8a68c7c3903e2bad98605d82339460c5ea930/python/pyarrow/tests/test_dataset_encryption.py#L131
5) how does this interact with `decryption_config`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]