Larry White created ARROW-17346:
-----------------------------------
Summary: Document the use of the batchSize argument in Dataset
ScanOptions
Key: ARROW-17346
URL: https://issues.apache.org/jira/browse/ARROW-17346
Project: Apache Arrow
Issue Type: Improvement
Components: Java
Affects Versions: 9.0.0
Reporter: Larry White
Assignee: Larry White
Several ScanOptions methods take a batchSize argument as shown:
{{public ScanOptions(long batchSize) {}}
{{ this(batchSize, Optional.empty());}}
{{}}}
Since the scanner reads one ArrowRecordBatch per load invocation, setting the
parameter to a size larger than the RecordBatch has no effect. It only works
when it's smaller than the number of rows in the RecordBatch, (i.e., the number
or records read is equal to min(batchSize, recordBatch rowCount), potentially
leading to some confusion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)