[
https://issues.apache.org/jira/browse/ARROW-14025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424119#comment-17424119
]
David Li commented on ARROW-14025:
----------------------------------
Something like this:
{code:cpp}
diff --git a/r/src/compute-exec.cpp b/r/src/compute-exec.cpp
index c61f7a3d1..cd34ad42f 100644
--- a/r/src/compute-exec.cpp
+++ b/r/src/compute-exec.cpp
@@ -114,6 +114,15 @@ std::shared_ptr<compute::ExecNode> ExecNode_Scan(
// TODO: pass in FragmentScanOptions
auto options = std::make_shared<arrow::dataset::ScanOptions>();
+ if (dataset->type_name() == "filesystem") {
+ auto fs_dataset = static_cast<const
arrow::dataset::FileSystemDataset&>(*dataset);
+ if (fs_dataset.format()->type_name() == "parquet") {
+ auto fragment_scan_options =
std::make_shared<arrow::dataset::ParquetFragmentScanOptions>();
+ fragment_scan_options->arrow_reader_properties->set_pre_buffer(true);
+
fragment_scan_options->arrow_reader_properties->set_cache_options(arrow::io::CacheOptions::LazyDefaults());
+ options->fragment_scan_options = std::move(fragment_scan_options);
+ }
+ }
options->use_async = true;
options->use_threads = arrow::r::GetBoolOption("arrow.use_threads", true);
{code}
> [R][C++] PreBuffer is not enabled when scanning parquet via exec nodes
> ----------------------------------------------------------------------
>
> Key: ARROW-14025
> URL: https://issues.apache.org/jira/browse/ARROW-14025
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, R
> Reporter: Weston Pace
> Assignee: Neal Richardson
> Priority: Major
> Fix For: 6.0.0
>
>
> In ExecNode_Scan a ScanOptions object is built up. If we are reading parquet
> we should enable pre-buffering. This is done by creating a
> ParquetFragmentScanOptions object and enabling pre-buffering.
> Alternatively, we could just default pre-buffering to true for asynchronous
> scans of parquet data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)