AlenkaF commented on code in PR #39438:
URL: https://github.com/apache/arrow/pull/39438#discussion_r1504063398


##########
python/pyarrow/_dataset_parquet.pyx:
##########
@@ -127,8 +127,14 @@ cdef class ParquetFileFormat(FileFormat):
                             'instance of ParquetReadOptions')
 
         if default_fragment_scan_options is None:
-            default_fragment_scan_options = ParquetFragmentScanOptions(
-                **scan_args)
+            # remove decryption_properties from scan_args as it does not take 
this parameter
+            decryption_properties = scan_args.pop('decryption_properties', 
None)
+            default_fragment_scan_options = 
ParquetFragmentScanOptions(**scan_args)
+            # make sure scan options has decryption properties
+            if decryption_properties is not None:
+                default_fragment_scan_options.set_file_decryption_properties(
+                    decryption_properties)
+
         elif isinstance(default_fragment_scan_options, dict):

Review Comment:
   Should we also make sure that `decryption_properties` are set if 
`default_fragment_scan_options` are specified as a dictionary together with  
`decryption_properties`?
   
   Though, if the dictionary is used to pass options to `ParquetFileFormat` it 
will be considered as `**kwargs` so I am not even sure if there is any use case 
to support `default_fragment_scan_options` being a dictionary?
   
   Also, would it be better to add `decryption_properties` to 
`ParquetFragmentScanOptions` `reader_properties()`? This way the logic for 
removing `decryption_properties` from `scan_args` and setting them manually via 
`set_file_decryption_properties` would not be needed?



##########
python/pyarrow/parquet/core.py:
##########
@@ -1327,17 +1327,15 @@ def __init__(self, path_or_paths, filesystem=None, 
schema=None, *, filters=None,
                     except ValueError:
                         filesystem = LocalFileSystem(use_mmap=memory_map)
                 finfo = filesystem.get_file_info(path_or_paths)
-                if finfo.is_file:
-                    single_file = path_or_paths

Review Comment:
   Oh, I think I get it now! 😊 
   Do we have an existing test for the BufferReader case and if not, can we add 
one?
   
   Also, can we then change the comment in line 1316: "# check for single 
fragment dataset" to something that describes this part of the code correctly, 
like "check for BufferReader or dataset directory"? (not sure I got this right 
though).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to