[
https://issues.apache.org/jira/browse/ARROW-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501070#comment-17501070
]
Weston Pace commented on ARROW-15820:
-------------------------------------
We probably don't want to wait on ARROW-15261. The change is dependent on
ARROW-12311 and I don't know if that issue is going to be tackled anytime soon.
Otherwise, that all looks good to me. I'll look over the PR.
> [C++][Doc] Add table_source to streaming_execution.rst & clarify parameter
> name
> -------------------------------------------------------------------------------
>
> Key: ARROW-15820
> URL: https://issues.apache.org/jira/browse/ARROW-15820
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Currently the table_source node does not appear in our documentation.
> Also, in {{TableSourceNodeOptions}} we have:
> {noformat}
> // Size of batches to emit from this node
> // If the table is larger the node will emit multiple batches from the
> // the table to be processed in parallel.
> int64_t batch_size;
> {noformat}
> However, when looking into a performance issue today, I realized this
> description is incomplete. In reality we should probably call this parameter
> {{max_batch_size}}.
> Furthermore, we should make it clear that a table with smaller batches will
> emit smaller batches directly (this is a good thing in my case) and will not
> concatenate small batches together into a larger batch.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)