[jira] [Updated] (ARROW-16638) [Go][Parquet] Boolean column reader fails to skip rows

ASF GitHub Bot (Jira) Mon, 23 May 2022 16:50:08 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-16638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated ARROW-16638:
-----------------------------------
    Labels: pull-request-available  (was: )

> [Go][Parquet] Boolean column reader fails to skip rows
> ------------------------------------------------------
>
>                 Key: ARROW-16638
>                 URL: https://issues.apache.org/jira/browse/ARROW-16638
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Go
>            Reporter: Matt DePero
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 9.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Skipping values in the go parquet column reader is effectively implemented by 
> reading the target number of rows into scratch space which is then discarded. 
> In the boolean case, 
> [BytesRequired|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader.go#L439]
>  returns returns a scratch buffer that allocates one bit per row, however 
> that [same scratch 
> space|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213]
>  is also attempted to be used for `defLvls` and `repLvls` (both int16), which 
> requires two bytes per row. Since the boolean `values` buffer is not large 
> enough to hold the same number of rows worth of def and rep levels, skipping 
> too many rows results in an index out of bounds panic.
>  
> Note that for other column types, this does not seem to be an issue since the 
> buffer needed for `values` is always larger than the buffer needed for def 
> and rep levels, however there still seems to be no reason to include any 
> non-nil value to `cr.ReadBatch(...)` for [rep and def 
> lvls|https://github.com/apache/arrow/blob/4c21fd12f93e4853c03c05919ffb22c6bb8f09b0/go/parquet/file/column_reader_types.gen.go#L212-L213]
>  when skipping any column in the reader.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (ARROW-16638) [Go][Parquet] Boolean column reader fails to skip rows

Reply via email to