[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10074: ARROW-12428: [C++][Python] Expose pre_buffer in pyarrow.parquet

GitBox Mon, 19 Apr 2021 05:14:24 -0700


jorisvandenbossche commented on a change in pull request #10074:
URL: https://github.com/apache/arrow/pull/10074#discussion_r615794180




##########
File path: python/pyarrow/parquet.py
##########
@@ -1674,6 +1686,12 @@ def pieces(self):
     keys and only a hive-style directory structure is supported. When
     setting `use_legacy_dataset` to False, also within-file level filtering
     and different partitioning schemes are supported.
+pre_buffer : bool, default True
+    Coalesce and issue file reads in parallel to improve performance on

Review comment:
       > I guess for someone who wants Arrow to be truly single-threaded, this 
may be confusing, so I'll see if I can reword this.
   
   So for such a case, you would need to set `pre_buffer=False` to have it 
truly single-threaded?  
   I am thinking about dask, where `use_threads` is set to False, but I am not 
sure if it would be a problem for them to still do I/O in parallel.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10074: ARROW-12428: [C++][Python] Expose pre_buffer in pyarrow.parquet

Reply via email to