westonpace commented on a change in pull request #11285:
URL: https://github.com/apache/arrow/pull/11285#discussion_r720449458
##########
File path: cpp/src/arrow/dataset/scanner.cc
##########
@@ -593,20 +593,23 @@ Result<EnumeratedRecordBatchGenerator>
AsyncScanner::ScanBatchesUnorderedAsync(
ARROW_ASSIGN_OR_RAISE(auto plan,
compute::ExecPlan::Make(exec_context.get()));
AsyncGenerator<util::optional<compute::ExecBatch>> sink_gen;
+ util::BackpressureOptions backpressure =
+ util::MakeBackpressureOptions(kDefaultBackpressureLow,
kDefaultBackpressureHigh);
Review comment:
I don't want to add too many tuning parameters to scan options. I think
I'd wait until these defaults don't work for someone before exposing them. At
the moment I couldn't give any reasonable advice on how to tune these.
Increasing the values would increase the amount of memory used but I don't
believe it would have any significant impact on performance in most cases. The
backpressure limit in the dataset writer is also hidden from the user. It
might be that we want a single parameter to influence backpressure tuning
settings across the board.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]