paveon opened a new pull request, #817:
URL: https://github.com/apache/arrow-go/pull/817
### Rationale for this change
`GetRecordReader` passes `BatchSize` directly to the internal `recordReader`
without capping it to the actual number of rows. When `BatchSize` is
configured
to a large value (e.g. 131072) but the file or requested row groups contain
few rows (e.g. 10), `leafReader.LoadBatch` calls `Reserve(131072)` which
pre-allocates definition/repetition level buffers and value buffers sized for
the full batch. For a 200-column int64 table with 10 rows this wastes ~250 MB
of allocations.
### What changes are included in this PR?
Cap `batchSize` to `NextPowerOf2(nrows)` when a `BatchSize` is explicitly
configured. The power-of-2 rounding keeps allocations aligned with the
downstream `updateCapacity` logic that already rounds to powers of two,
avoiding a redundant reallocation on the first read.
### Are these changes tested?
Existing tests pass. The change is on the allocation-sizing path only —
read correctness is unaffected since `LoadBatch` already stops reading
when rows are exhausted.
### Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]