westonpace commented on a change in pull request #12533:
URL: https://github.com/apache/arrow/pull/12533#discussion_r818123668
##########
File path: r/src/compute-exec.cpp
##########
@@ -277,4 +277,18 @@ std::shared_ptr<compute::ExecNode>
ExecNode_ReadFromRecordBatchReader(
return MakeExecNodeOrStop("source", plan.get(), {}, options);
}
+// [[arrow::export]]
+std::shared_ptr<compute::ExecNode> ExecNode_TableSourceNode(
+ const std::shared_ptr<compute::ExecPlan>& plan,
+ const std::shared_ptr<arrow::Table>& table) {
+ arrow::compute::TableSourceNodeOptions options{
+ /*table=*/table,
+ // TODO: make batch_size configurable
+ /*batch_size=*/1048576
Review comment:
So I was doing some related testing of the table source node yesterday
and I think we are going to change the name of this to `max_batch_size` because
it doesn't actually concatenate small batches, it only splits up large ones.
I'm not mentioning this to delay this PR, proceed as is and we will adjust the
R linkage in the PR that does the rename.
Mostly, I'm just looking to point out that batch size will still largely be
controlled by the customer in how they create their table (e.g. if they create
a table with many small batches then we will feed many small batches into the
exec plan) and this configuration is really only a failsafe to use when the
customer has tables with really large batches to make sure we get some
parallelism.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]