AlenkaF commented on code in PR #39438:
URL: https://github.com/apache/arrow/pull/39438#discussion_r1504063398
##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -127,8 +127,14 @@ cdef class ParquetFileFormat(FileFormat):
'instance of ParquetReadOptions')
if default_fragment_scan_options is None:
- default_fragment_scan_options = ParquetFragmentScanOptions(
- **scan_args)
+ # remove decryption_properties from scan_args as it does not take
this parameter
+ decryption_properties = scan_args.pop('decryption_properties',
None)
+ default_fragment_scan_options =
ParquetFragmentScanOptions(**scan_args)
+ # make sure scan options has decryption properties
+ if decryption_properties is not None:
+ default_fragment_scan_options.set_file_decryption_properties(
+ decryption_properties)
+
elif isinstance(default_fragment_scan_options, dict):
Review Comment:
Should we also make sure that `decryption_properties` are set if
`default_fragment_scan_options` are specified as a dictionary together with
`decryption_properties`?
Though, if the dictionary is used to pass options to `ParquetFileFormat` it
will be considered as `**kwargs` so I am not even sure if there is any use case
to support `default_fragment_scan_options` being a dictionary?
Also, would it be better to add `decryption_properties` to
`ParquetFragmentScanOptions` `reader_properties()`? This way the logic for
removing `decryption_properties` from `scan_args` and setting them manually via
`set_file_decryption_properties` would not be needed?
##########
python/pyarrow/parquet/core.py:
##########
@@ -1327,17 +1327,15 @@ def __init__(self, path_or_paths, filesystem=None,
schema=None, *, filters=None,
except ValueError:
filesystem = LocalFileSystem(use_mmap=memory_map)
finfo = filesystem.get_file_info(path_or_paths)
- if finfo.is_file:
- single_file = path_or_paths
Review Comment:
Oh, I think I get it now! 😊
Do we have an existing test for the BufferReader case and if not, can we add
one?
Also, can we then change the comment in line 1316: "# check for single
fragment dataset" to something that describes this part of the code correctly,
like "check for BufferReader or dataset directory"? (not sure I got this right
though).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]