jorisvandenbossche commented on issue #36765: URL: https://github.com/apache/arrow/issues/36765#issuecomment-1733375154
I was chatting about this issue with some people at PyData Amsterdam, and was planning to make a PR to just switch the default when back, so here it is: https://github.com/apache/arrow/pull/37854 That's only changing the default for Python (`pyarrow.dataset`), but should we also change the default in C++? From a basic check, it seems the R code already sets it by default (this was changed a while ago in https://github.com/apache/arrow/pull/11386). I noticed that the R PR was also setting the `cache_options` to `LazyDefaults`. That's then also something we want to change in the Python/C++ side? (current default is `CacheOptions::Defaults()`) Another useful reference for the above discussion is https://github.com/apache/arrow/issues/28218 (https://issues.apache.org/jira/browse/ARROW-12428), where @lidavidm did some benchmarks with pre_buffer enabled/disabled, and which was the reason for exposing the pre_buffer option in `pyarrow.parquet` with a default of True (https://github.com/apache/arrow/pull/10074) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
