[ 
https://issues.apache.org/jira/browse/ARROW-14025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424119#comment-17424119
 ] 

David Li commented on ARROW-14025:
----------------------------------

Something like this:

{code:cpp}
diff --git a/r/src/compute-exec.cpp b/r/src/compute-exec.cpp
index c61f7a3d1..cd34ad42f 100644
--- a/r/src/compute-exec.cpp
+++ b/r/src/compute-exec.cpp
@@ -114,6 +114,15 @@ std::shared_ptr<compute::ExecNode> ExecNode_Scan(
 
   // TODO: pass in FragmentScanOptions
   auto options = std::make_shared<arrow::dataset::ScanOptions>();
+  if (dataset->type_name() == "filesystem") {
+    auto fs_dataset = static_cast<const 
arrow::dataset::FileSystemDataset&>(*dataset);
+    if (fs_dataset.format()->type_name() == "parquet") {
+      auto fragment_scan_options = 
std::make_shared<arrow::dataset::ParquetFragmentScanOptions>();
+      fragment_scan_options->arrow_reader_properties->set_pre_buffer(true);
+      
fragment_scan_options->arrow_reader_properties->set_cache_options(arrow::io::CacheOptions::LazyDefaults());
+      options->fragment_scan_options = std::move(fragment_scan_options);
+    }
+  }
 
   options->use_async = true;
   options->use_threads = arrow::r::GetBoolOption("arrow.use_threads", true);
{code}
 

> [R][C++] PreBuffer is not enabled when scanning parquet via exec nodes
> ----------------------------------------------------------------------
>
>                 Key: ARROW-14025
>                 URL: https://issues.apache.org/jira/browse/ARROW-14025
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>            Reporter: Weston Pace
>            Assignee: Neal Richardson
>            Priority: Major
>             Fix For: 6.0.0
>
>
> In ExecNode_Scan a ScanOptions object is built up.  If we are reading parquet 
> we should enable pre-buffering.  This is done by creating a 
> ParquetFragmentScanOptions object and enabling pre-buffering.
> Alternatively, we could just default pre-buffering to true for asynchronous 
> scans of parquet data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to