Larry White created ARROW-17346:
-----------------------------------

             Summary: Document the use of the batchSize argument in Dataset 
ScanOptions
                 Key: ARROW-17346
                 URL: https://issues.apache.org/jira/browse/ARROW-17346
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Java
    Affects Versions: 9.0.0
            Reporter: Larry White
            Assignee: Larry White


Several ScanOptions methods take a batchSize argument as shown: 

{{public ScanOptions(long batchSize) {}}
{{    this(batchSize, Optional.empty());}}
{{}}}

Since the scanner reads one ArrowRecordBatch per load invocation, setting the 
parameter to a size larger than the RecordBatch has no effect. It only works 
when it's smaller than the number of rows in the RecordBatch, (i.e., the number 
or records read is equal to min(batchSize, recordBatch rowCount), potentially 
leading to some confusion. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to