rok opened a new issue, #855: URL: https://github.com/apache/arrow-go/issues/855
## Problem Arrow `FixedSizeList<T, N>` is the natural type for fixed-shape data — embeddings, image/tensor patches, fixed-precision decimal vectors — where every value has exactly `N` elements and the shape is fixed and known from the schema. Today pqarrow round-trips it through Parquet as a standard 3-level `LIST`, writing per-element repetition and definition levels for a length that never varies. For wide dense vectors that is pure overhead; apache/arrow#34510 measured a ~3x read gap that motivates a denser encoding. ## Proposal Add an experimental Parquet `VECTOR` `FieldRepetitionType` that stores a fixed number of element values per row directly, without per-element rep/def levels, and map Arrow `FixedSizeList` onto it. This is the "Option B" design from the *Fixed-size list type for Parquet* proposal (and the arrow-cpp prototype, rok/arrow#51). A reduced, **leaf-only** first phase: - A `VECTOR` column is a single primitive leaf carrying `vector_length`: `vector <element-type> <name> [N];` — not a nested group. - Only dense, non-nullable, top-level `FixedSizeList` columns with a fixed-width primitive element are encoded as `VECTOR`. Everything else (nullable value or element, zero-length, variable-width/dictionary/extension/struct/nested-list element, or a nested `FixedSizeList`) transparently falls back to the standard `LIST` encoding. Nullable, struct, and nested vectors are follow-ups. - Opt-in on the writer via `pqarrow.WithVectorEncoding()`; reading is automatic. Format additions (not yet in apache/parquet-format): `FieldRepetitionType.VECTOR = 3` and `SchemaElement.vector_length` (field id 12). ## Caveat `VECTOR` is not part of apache/parquet-format yet, so this is strictly opt-in and non-portable: files written with `VECTOR` are rejected by readers that don't understand the repetition type. ## References - *Fixed-size list type for Parquet* design proposal - apache/arrow#34510 — measured ~3x read gap - arrow-cpp Option B prototype: rok/arrow#51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
