powersj opened a new issue, #40672: URL: https://github.com/apache/arrow/issues/40672
### Describe the bug, including details regarding any error messages, version, and platform. While working on a parquet file parser, I was running our tests on a 32-bit system and came across a panic. This is reproducable with the in-tree `parquet-reader`, using the following parquet file. First, it works on my 64-bit system as expected: ``` $ git clone [email protected]:apache/arrow $ cd arrow/go/parquet/cmd/parquet_reader $ cp ~/input.parquet . $ go run . input.parquet File name: input.parquet Version: v2.6 Created By: parquet-cpp-arrow version 15.0.1 Num Rows: 1 Number of RowGroups: 1 Number of Real Columns: 2 Number of Columns: 2 Number of Selected Columns: 2 Column 0: value (INT64) Column 1: timestamp (BYTE_ARRAY/UTF8) --- Row Group: 0 --- --- Total Bytes: 201 --- --- Rows: 1 --- Column 0 Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY Uncompressed Size: 92, Compressed Size: 96 Column 1 Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 0, Distinct Values: 0 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY Uncompressed Size: 109, Compressed Size: 113 --- Values --- value |timestamp | 42 |1710683608143228692| ``` Once I force 32-bit arch you can see the crash: ``` GOARCH=386 go run . input.parquet File name: input.parquet Version: v2.6 Created By: parquet-cpp-arrow version 15.0.1 Num Rows: 1 Number of RowGroups: 1 Number of Real Columns: 2 Number of Columns: 2 Number of Selected Columns: 2 Column 0: value (INT64) Column 1: timestamp (BYTE_ARRAY/UTF8) --- Row Group: 0 --- --- Total Bytes: 201 --- --- Rows: 1 --- Column 0 Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY Uncompressed Size: 92, Compressed Size: 96 Column 1 Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 0, Distinct Values: 0 Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY Uncompressed Size: 109, Compressed Size: 113 --- Values --- value |timestamp | panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x83502c7] goroutine 1 [running]: github.com/apache/arrow/go/v16/internal/utils.GetMinMaxInt32(...) /home/powersj/test/arrow/go/internal/utils/min_max.go:190 github.com/apache/arrow/go/v16/parquet/internal/encoding.(*Int64DictConverter).IsValid(0x9492fe0, {0x9413270, 0x1, 0x1}) /home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:495 +0x27 github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDictInt64(0x94e0708, {0x8488974, 0x9492fe0}, {0x95b4c00, 0x1, 0x80}) /home/powersj/test/arrow/go/parquet/internal/utils/typed_rle_dict.gen.go:378 +0x228 github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDict(0x94e0708, {0x8488974, 0x9492fe0}, {0x839e4e0, 0x9511a84}) /home/powersj/test/arrow/go/parquet/internal/utils/rle.go:417 +0x14e github.com/apache/arrow/go/v16/parquet/internal/encoding.(*dictDecoder).decode(...) /home/powersj/test/arrow/go/parquet/internal/encoding/decoder.go:146 github.com/apache/arrow/go/v16/parquet/internal/encoding.(*DictInt64Decoder).Decode(0x95ae700, {0x95b4c00, 0x1, 0x80}) /home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:436 +0x75 github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch.func1(0x0, 0x1) /home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:93 +0xc3 github.com/apache/arrow/go/v16/parquet/file.(*columnChunkReader).readBatch(0x959c428, 0x80, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, 0x80}, 0x9511b6c) /home/powersj/test/arrow/go/parquet/file/column_reader.go:514 +0x2ab github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch(0x959c428, 0x80, {0x95b4c00, 0x80, 0x80}, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, ...}) /home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:92 +0xa3 main.(*Dumper).readNextBatch(0x94a0820) /home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:88 +0x27f main.(*Dumper).Next(0x94a0820) /home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:163 +0x61 main.main() /home/powersj/test/arrow/go/parquet/cmd/parquet_reader/main.go:359 +0x2a69 exit status 2 ``` The parquet file I used was generated via the following: ```python #!/usr/bin/env python import pandas as pd import pyarrow as pa import pyarrow.parquet as pq df = pandas.DataFrame({ 'value': [42], 'timestamp': ["1710683608143228692"] }) pyarrow.parquet.write_table(pyarrow.Table.from_pandas(df), "input.parquet") ``` ### Component(s) Go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
