[
https://issues.apache.org/jira/browse/ARROW-16638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-16638:
-----------------------------------
Labels: pull-request-available (was: )
> [Go][Parquet] Boolean column reader fails to skip rows
> ------------------------------------------------------
>
> Key: ARROW-16638
> URL: https://issues.apache.org/jira/browse/ARROW-16638
> Project: Apache Arrow
> Issue Type: Bug
> Components: Go
> Reporter: Matt DePero
> Priority: Major
> Labels: pull-request-available
> Fix For: 9.0.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Skipping values in the go parquet column reader is effectively implemented by
> reading the target number of rows into scratch space which is then discarded.
> In the boolean case,
> [BytesRequired|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader.go#L439]
> returns returns a scratch buffer that allocates one bit per row, however
> that [same scratch
> space|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213]
> is also attempted to be used for `defLvls` and `repLvls` (both int16), which
> requires two bytes per row. Since the boolean `values` buffer is not large
> enough to hold the same number of rows worth of def and rep levels, skipping
> too many rows results in an index out of bounds panic.
>
> Note that for other column types, this does not seem to be an issue since the
> buffer needed for `values` is always larger than the buffer needed for def
> and rep levels, however there still seems to be no reason to include any
> non-nil value to `cr.ReadBatch(...)` for [rep and def
> lvls|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213]
> when skipping any column in the reader.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)