rmorgans opened a new issue, #584:
URL: https://github.com/apache/arrow-go/issues/584
### Describe the bug, including details regarding any error messages,
version, and platform.
## Describe the bug
When writing a `FixedSizeList<float32>` array to Parquet via
`pqarrow.FileWriter`, the values are written correctly in-memory but are read
back as `NULL` when using `pqarrow.FileReader.ReadTable`.
The same write pattern using a standard `List<float32>` (with
`arrow.ListOf`) produces correct values.
## Reproduction
```go
// fixedsize_list_parquet_repro.go
//
// Minimal reproduction for FixedSizeList + Parquet issue in Arrow Go.
// Writes a FixedSizeList<float32>[8] with values [1..8] and reads it back
// via pqarrow. On v14.0.2 the values are read as nulls, while the in-memory
// record before writing is correct.
package main
import (
"context"
"fmt"
"os"
"path/filepath"
"github.com/apache/arrow/go/v14/arrow"
"github.com/apache/arrow/go/v14/arrow/array"
"github.com/apache/arrow/go/v14/arrow/memory"
"github.com/apache/arrow/go/v14/parquet"
"github.com/apache/arrow/go/v14/parquet/file"
"github.com/apache/arrow/go/v14/parquet/pqarrow"
)
func main() {
const dim = 8
expected := []float32{1, 2, 3, 4, 5, 6, 7, 8}
out := filepath.Join(os.TempDir(), "fixedsize_bug.parquet")
fmt.Println("Parquet file:", out)
// Schema: FixedSizeList<float32>[8]
schema := arrow.NewSchema(
[]arrow.Field{
{
Name: "embedding",
Type: arrow.FixedSizeListOf(int32(dim),
arrow.PrimitiveTypes.Float32),
},
},
nil,
)
pool := memory.NewGoAllocator()
// --- Write ---
f, err := os.Create(out)
if err != nil {
panic(err)
}
props := parquet.NewWriterProperties()
awProps := pqarrow.NewArrowWriterProperties()
pw, err := pqarrow.NewFileWriter(schema, f, props, awProps)
if err != nil {
panic(err)
}
b := array.NewRecordBuilder(pool, schema)
defer b.Release()
flb := b.Field(0).(*array.FixedSizeListBuilder)
vb := flb.ValueBuilder().(*array.Float32Builder)
// Single FixedSizeList value [1..8]
flb.Append(true)
for _, v := range expected {
vb.Append(v)
}
rec := b.NewRecord()
defer rec.Release()
fmt.Println("In-memory record before write:")
fmt.Println(rec)
if err := pw.Write(rec); err != nil {
panic(err)
}
// Ensure Parquet footer and metadata are fully written
if err := pw.Close(); err != nil {
panic(err)
}
// --- Read back via pqarrow ---
rf, err := os.Open(out)
if err != nil {
panic(err)
}
defer rf.Close()
pr, err := file.NewParquetReader(rf)
if err != nil {
panic(err)
}
defer pr.Close()
fr, err := pqarrow.NewFileReader(pr, pqarrow.ArrowReadProperties{},
pool)
if err != nil {
panic(err)
}
tbl, err := fr.ReadTable(context.Background())
if err != nil {
panic(err)
}
defer tbl.Release()
fmt.Println("\nExpected values:", expected)
fmt.Println("Table read back:")
fmt.Println(tbl)
}
```
Example output on v14.0.2:
```
go run ./fixedsize_list_parquet_repro.go
Parquet file:
/var/folders/95/j3gr9h157fq0djs38znqgkg80000gn/T/fixedsize_bug.parquet
In-memory record before write:
record:
schema:
fields: 1
- embedding: type=fixed_size_list<item: float32, nullable>[8]
rows: 1
col[0][embedding]: [[1 2 3 4 5 6 7 8]]
Expected values: [1 2 3 4 5 6 7 8]
Table read back:
schema:
fields: 1
- embedding: type=list<list: float32, nullable>
metadata: ["PARQUET:field_id": "-1"]
embedding: [[[(null) (null) (null) (null) (null) (null) (null) (null)]]]
```
## Expected behavior
The embedding values should be read back as `[1 2 3 4 5 6 7 8]`, matching
the in-memory FixedSizeList<float32>[8] before the Parquet write.
## Actual behavior
The embedding values are read back as a list of 8 NULL values when using
pqarrow.FileReader.ReadTable, even though the in-memory record before writing
is correct.
## Likely root cause (code-level)
In parquet/pqarrow/path_builder.go (Arrow Go v14.0.2), the FIXED_SIZE_LIST
case in pathBuilder.Visit does not update p.nullableInParent before visiting
the child values, while the LIST case does.
addTerminalInfo increments p.info.maxDefLevel when p.nullableInParent is
true. For LIST this flag is set, so present values get the higher def-level;
for FIXED_SIZE_LIST it remains false, so present values are encoded/decoded
with a lower def-level and are interpreted as nulls.
A minimal fix appears to be setting p.nullableInParent = true in the
FIXED_SIZE_LIST branch before Visit(larr.ListValues()), mirroring the LIST
handling.
## Environment
- Arrow Go: v14.0.2
- Go: 1.21+ (repro’d with go1.24 toolchain)
- OS: macOS (ARM64)
- Reader used: pqarrow.FileReader.ReadTable
(behavior also visible when inspecting the
Parquet file with DuckDB)
### Component(s)
Parquet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]