[
https://issues.apache.org/jira/browse/ARROW-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Topol reassigned ARROW-17169:
-------------------------------------
Assignee: Matthew Topol
> [Go] goPanicIndex in firstTimeBitmapWriter.Finish()
> ---------------------------------------------------
>
> Key: ARROW-17169
> URL: https://issues.apache.org/jira/browse/ARROW-17169
> Project: Apache Arrow
> Issue Type: Bug
> Components: Go, Parquet
> Affects Versions: 9.0.0, 8.0.1
> Environment: go (1.18.3), Linux, AMD64
> Reporter: Robert Purdom
> Assignee: Matthew Topol
> Priority: Critical
>
> I'm working with complex parquet files with 500+ "root" columns where some
> fields are lists of structs, internally referred to as 'topics'. Some of
> these structs have 100's of columns. When reading a particular topic, I get
> an Index Panic at the line indicated below. This error occurs when the value
> for the topic is Null, as in, for this particular root record, this topic has
> no data. The root is household data, the topic is auto, so the error occurs
> when the household has no autos. The auto field is a Nullable List of Struct.
>
> {code:go}
> /* Finish() was called from defLevelsToBitmapInternal.
> data values when panic occurs....
> bw.length == 17531
> bw.bitMask == 1
> bw.pos == 3424
> bw.length == 17531
> len(bw.Buf) == 428
> cap(bw.Buf) == 448
> bw.byteOffset == 428
> bw.curByte == 0
> */
> // bitmap_writer.go
> func (bw *firstTimeBitmapWriter) Finish() {
> // store curByte into the bitmap
> if bw.length >0&& bw.bitMask !=0x01|| bw.pos < bw.length {
> bw.buf[int(bw.byteOffset)] = bw.curByte // <---- Panic index
> }
> }
> {code}
> In every case, when the panic occurs, bw.byteOffset == len(bw.Buf). I tested
> the below modification and it does remedy the bug. However, it's probably
> only masking the actual bug.
> {code:go}
> // Test version: No Panic
> func (bw *firstTimeBitmapWriter) Finish() {
> // store curByte into the bitmap
> if bw.length > 0 && bw.bitMask != 0x01 || bw.pos < bw.length {
> if int(bw.byteOffset) == len(bw.Buf) {
> bw.buf = append(bw.buf, bw.curByte)
> } else {
> bw.buf[int(bw.byteOffset)] = bw.curByte
> }
> }
> }{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)