rmorgans opened a new issue, #584:
URL: https://github.com/apache/arrow-go/issues/584

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ## Describe the bug
   
   When writing a `FixedSizeList<float32>` array to Parquet via 
`pqarrow.FileWriter`, the values are  written correctly in-memory but are read 
back as `NULL` when using `pqarrow.FileReader.ReadTable`.
   
   The same write pattern using a standard `List<float32>` (with 
`arrow.ListOf`) produces correct values.
   
     ## Reproduction
   
   ```go
   // fixedsize_list_parquet_repro.go
   //
   // Minimal reproduction for FixedSizeList + Parquet issue in Arrow Go.
   // Writes a FixedSizeList<float32>[8] with values [1..8] and reads it back
   // via pqarrow. On v14.0.2 the values are read as nulls, while the in-memory
   // record before writing is correct.
   
   package main
   
   import (
        "context"
        "fmt"
        "os"
        "path/filepath"
   
        "github.com/apache/arrow/go/v14/arrow"
        "github.com/apache/arrow/go/v14/arrow/array"
        "github.com/apache/arrow/go/v14/arrow/memory"
        "github.com/apache/arrow/go/v14/parquet"
        "github.com/apache/arrow/go/v14/parquet/file"
        "github.com/apache/arrow/go/v14/parquet/pqarrow"
   )
   
   func main() {
        const dim = 8
        expected := []float32{1, 2, 3, 4, 5, 6, 7, 8}
   
        out := filepath.Join(os.TempDir(), "fixedsize_bug.parquet")
        fmt.Println("Parquet file:", out)
   
        // Schema: FixedSizeList<float32>[8]
        schema := arrow.NewSchema(
                []arrow.Field{
                        {
                                Name: "embedding",
                                Type: arrow.FixedSizeListOf(int32(dim), 
arrow.PrimitiveTypes.Float32),
                        },
                },
                nil,
        )
   
        pool := memory.NewGoAllocator()
   
        // --- Write ---
   
        f, err := os.Create(out)
        if err != nil {
                panic(err)
        }
   
        props := parquet.NewWriterProperties()
        awProps := pqarrow.NewArrowWriterProperties()
   
        pw, err := pqarrow.NewFileWriter(schema, f, props, awProps)
        if err != nil {
                panic(err)
        }
   
        b := array.NewRecordBuilder(pool, schema)
        defer b.Release()
   
        flb := b.Field(0).(*array.FixedSizeListBuilder)
        vb := flb.ValueBuilder().(*array.Float32Builder)
   
        // Single FixedSizeList value [1..8]
        flb.Append(true)
        for _, v := range expected {
                vb.Append(v)
        }
   
        rec := b.NewRecord()
        defer rec.Release()
   
        fmt.Println("In-memory record before write:")
        fmt.Println(rec)
   
        if err := pw.Write(rec); err != nil {
                panic(err)
        }
   
        // Ensure Parquet footer and metadata are fully written
        if err := pw.Close(); err != nil {
                panic(err)
        }
   
        // --- Read back via pqarrow ---
   
        rf, err := os.Open(out)
        if err != nil {
                panic(err)
        }
        defer rf.Close()
   
        pr, err := file.NewParquetReader(rf)
        if err != nil {
                panic(err)
        }
        defer pr.Close()
   
        fr, err := pqarrow.NewFileReader(pr, pqarrow.ArrowReadProperties{}, 
pool)
        if err != nil {
                panic(err)
        }
   
        tbl, err := fr.ReadTable(context.Background())
        if err != nil {
                panic(err)
        }
        defer tbl.Release()
   
        fmt.Println("\nExpected values:", expected)
        fmt.Println("Table read back:")
        fmt.Println(tbl)
   }
   ```
   
   Example output on v14.0.2:
   
   ```
   go run ./fixedsize_list_parquet_repro.go
   Parquet file: 
/var/folders/95/j3gr9h157fq0djs38znqgkg80000gn/T/fixedsize_bug.parquet
   In-memory record before write:
   record:
     schema:
     fields: 1
       - embedding: type=fixed_size_list<item: float32, nullable>[8]
     rows: 1
     col[0][embedding]: [[1 2 3 4 5 6 7 8]]
   
   
   Expected values: [1 2 3 4 5 6 7 8]
   Table read back:
   schema:
     fields: 1
       - embedding: type=list<list: float32, nullable>
              metadata: ["PARQUET:field_id": "-1"]
   embedding: [[[(null) (null) (null) (null) (null) (null) (null) (null)]]]
   
   ```
   
   ## Expected behavior
   
     The embedding values should be read back as `[1 2 3 4 5 6 7 8]`, matching 
the in-memory FixedSizeList<float32>[8] before the Parquet write.
   
     ## Actual behavior
   
     The embedding values are read back as a list of 8 NULL values when using 
pqarrow.FileReader.ReadTable, even though the in-memory record before writing 
is correct.
   
     ## Likely root cause (code-level)
   
     In parquet/pqarrow/path_builder.go (Arrow Go v14.0.2), the FIXED_SIZE_LIST 
case in pathBuilder.Visit does not update  p.nullableInParent before visiting 
the child  values, while the LIST case does.
   
     addTerminalInfo increments p.info.maxDefLevel  when p.nullableInParent is 
true. For LIST this  flag is set, so present values get the higher def-level; 
for FIXED_SIZE_LIST it remains false,  so present values are encoded/decoded 
with a  lower def-level and are interpreted as nulls.
   
     A minimal fix appears to be setting  p.nullableInParent = true in the 
FIXED_SIZE_LIST  branch before Visit(larr.ListValues()), mirroring  the LIST 
handling.
   
     ## Environment
   
     - Arrow Go: v14.0.2
     - Go: 1.21+ (repro’d with go1.24 toolchain)
     - OS: macOS (ARM64)
     - Reader used: pqarrow.FileReader.ReadTable
       (behavior also visible when inspecting the
       Parquet file with DuckDB)
   
   ### Component(s)
   
   Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to