Weston Pace created ARROW-15820:
-----------------------------------

             Summary: [C++][Doc] Add table_source to streaming_execution.rst & 
clarify parameter name
                 Key: ARROW-15820
                 URL: https://issues.apache.org/jira/browse/ARROW-15820
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


Currently the table_source node does not appear in our documentation.

Also, in {{TableSourceNodeOptions}} we have:

{noformat}
  // Size of batches to emit from this node
  // If the table is larger the node will emit multiple batches from the
  // the table to be processed in parallel.
  int64_t batch_size;
{noformat}

However, when looking into a performance issue today, I realized this 
description is incomplete.  In reality we should probably call this parameter 
{{max_batch_size}}.

Furthermore, we should make it clear that a table with smaller batches will 
emit smaller batches directly (this is a good thing in my case) and will not 
concatenate small batches together into a larger batch.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to