jorisvandenbossche commented on a change in pull request #7156: URL: https://github.com/apache/arrow/pull/7156#discussion_r442172970
########## File path: python/pyarrow/parquet.py ########## @@ -1390,6 +1390,26 @@ def __init__(self, path_or_paths, filesystem=None, filters=None, "Keyword '{0}' is not yet supported with the new " "Dataset API".format(keyword)) + # map format arguments + read_options = {} + if buffer_size: + read_options.update(use_buffered_stream=True, + buffer_size=buffer_size) + if read_dictionary is not None: + read_options.update(dictionary_columns=read_dictionary) + parquet_format = ds.ParquetFileFormat(read_options=read_options) + + # map filters to Expressions + self._filters = filters + self._filter_expression = filters and _filters_to_expression(filters) + + # check for single NativeFile dataset + if not isinstance(path_or_paths, list): + if not _is_path_like(path_or_paths): + self._fragment = parquet_format.make_fragment(path_or_paths) + self._dataset = None Review comment: I think it will be cleaner to make a Dataset from this single fragment. That's not yet possible right now from python (have a PR for it), so can do as a follow-up ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org