powersj opened a new issue, #40672:
URL: https://github.com/apache/arrow/issues/40672

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   While working on a parquet file parser, I was running our tests on a 32-bit 
system and came across a panic. This is reproducable with the in-tree 
`parquet-reader`, using the following parquet file. First, it works on my 
64-bit system as expected:
   
   ```
   $ git clone [email protected]:apache/arrow
   $ cd arrow/go/parquet/cmd/parquet_reader
   $ cp ~/input.parquet .
   $ go run . input.parquet 
   File name: input.parquet
   Version: v2.6
   Created By: parquet-cpp-arrow version 15.0.1
   Num Rows: 1
   Number of RowGroups: 1
   Number of Real Columns: 2
   Number of Columns: 2
   Number of Selected Columns: 2
   Column 0: value (INT64)
   Column 1: timestamp (BYTE_ARRAY/UTF8)
   --- Row Group: 0  ---
   --- Total Bytes: 201  ---
   --- Rows: 1  ---
   Column 0
    Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0
    Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
    Uncompressed Size: 92, Compressed Size: 96
   Column 1
    Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], 
Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 
0, Distinct Values: 0
    Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
    Uncompressed Size: 109, Compressed Size: 113
   --- Values ---
   value             |timestamp         |
   42                |1710683608143228692|
   ```
   
   Once I force 32-bit arch you can see the crash:
   
   ```
   GOARCH=386 go run . input.parquet 
   File name: input.parquet
   Version: v2.6
   Created By: parquet-cpp-arrow version 15.0.1
   Num Rows: 1
   Number of RowGroups: 1
   Number of Real Columns: 2
   Number of Columns: 2
   Number of Selected Columns: 2
   Column 0: value (INT64)
   Column 1: timestamp (BYTE_ARRAY/UTF8)
   --- Row Group: 0  ---
   --- Total Bytes: 201  ---
   --- Rows: 1  ---
   Column 0
    Values: 1, Min: 42, Max: 42, Null Values: 0, Distinct Values: 0
    Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
    Uncompressed Size: 92, Compressed Size: 96
   Column 1
    Values: 1, Min: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], 
Max: [49 55 49 48 54 56 51 54 48 56 49 52 51 50 50 56 54 57 50], Null Values: 
0, Distinct Values: 0
    Compression: SNAPPY, Encodings: PLAIN RLE RLE_DICTIONARY
    Uncompressed Size: 109, Compressed Size: 113
   --- Values ---
   value             |timestamp         |
   panic: runtime error: invalid memory address or nil pointer dereference
   [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x83502c7]
   
   goroutine 1 [running]:
   github.com/apache/arrow/go/v16/internal/utils.GetMinMaxInt32(...)
        /home/powersj/test/arrow/go/internal/utils/min_max.go:190
   
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*Int64DictConverter).IsValid(0x9492fe0,
 {0x9413270, 0x1, 0x1})
        
/home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:495 
+0x27
   
github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDictInt64(0x94e0708,
 {0x8488974, 0x9492fe0}, {0x95b4c00, 0x1, 0x80})
        
/home/powersj/test/arrow/go/parquet/internal/utils/typed_rle_dict.gen.go:378 
+0x228
   
github.com/apache/arrow/go/v16/parquet/internal/utils.(*RleDecoder).GetBatchWithDict(0x94e0708,
 {0x8488974, 0x9492fe0}, {0x839e4e0, 0x9511a84})
        /home/powersj/test/arrow/go/parquet/internal/utils/rle.go:417 +0x14e
   
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*dictDecoder).decode(...)
        /home/powersj/test/arrow/go/parquet/internal/encoding/decoder.go:146
   
github.com/apache/arrow/go/v16/parquet/internal/encoding.(*DictInt64Decoder).Decode(0x95ae700,
 {0x95b4c00, 0x1, 0x80})
        
/home/powersj/test/arrow/go/parquet/internal/encoding/typed_encoder.gen.go:436 
+0x75
   
github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch.func1(0x0,
 0x1)
        /home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:93 
+0xc3
   
github.com/apache/arrow/go/v16/parquet/file.(*columnChunkReader).readBatch(0x959c428,
 0x80, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, 0x80}, 0x9511b6c)
        /home/powersj/test/arrow/go/parquet/file/column_reader.go:514 +0x2ab
   
github.com/apache/arrow/go/v16/parquet/file.(*Int64ColumnChunkReader).ReadBatch(0x959c428,
 0x80, {0x95b4c00, 0x80, 0x80}, {0x94f2300, 0x80, 0x80}, {0x94f2400, 0x80, ...})
        /home/powersj/test/arrow/go/parquet/file/column_reader_types.gen.go:92 
+0xa3
   main.(*Dumper).readNextBatch(0x94a0820)
        /home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:88 
+0x27f
   main.(*Dumper).Next(0x94a0820)
        /home/powersj/test/arrow/go/parquet/cmd/parquet_reader/dumper.go:163 
+0x61
   main.main()
        /home/powersj/test/arrow/go/parquet/cmd/parquet_reader/main.go:359 
+0x2a69
   exit status 2
   ```
   
   The parquet file I used was generated via the following:
   
   ```python
   #!/usr/bin/env python
   import pandas as pd
   import pyarrow as pa
   import pyarrow.parquet as pq
   
   df = pandas.DataFrame({
       'value': [42],
       'timestamp': ["1710683608143228692"]
   })
   
   pyarrow.parquet.write_table(pyarrow.Table.from_pandas(df), "input.parquet")
   ```
   
   ### Component(s)
   
   Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to