[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

GitBox Thu, 18 Jun 2020 05:01:49 -0700


jorisvandenbossche commented on a change in pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#discussion_r442172970




##########
File path: python/pyarrow/parquet.py
##########
@@ -1390,6 +1390,26 @@ def __init__(self, path_or_paths, filesystem=None, 
filters=None,
                     "Keyword '{0}' is not yet supported with the new "
                     "Dataset API".format(keyword))
 
+        # map format arguments
+        read_options = {}
+        if buffer_size:
+            read_options.update(use_buffered_stream=True,
+                                buffer_size=buffer_size)
+        if read_dictionary is not None:
+            read_options.update(dictionary_columns=read_dictionary)
+        parquet_format = ds.ParquetFileFormat(read_options=read_options)
+
+        # map filters to Expressions
+        self._filters = filters
+        self._filter_expression = filters and _filters_to_expression(filters)
+
+        # check for single NativeFile dataset
+        if not isinstance(path_or_paths, list):
+            if not _is_path_like(path_or_paths):
+                self._fragment = parquet_format.make_fragment(path_or_paths)
+                self._dataset = None

Review comment:
       I think it will be cleaner to make a Dataset from this single fragment. 
That's not yet possible right now from python (have a PR for it), so can do as 
a follow-up




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

Reply via email to