[PR] Add Parquet variant shredding support [arrow-dotnet]

via GitHub Sun, 26 Apr 2026 07:58:12 -0700


CurtHagenlocher opened a new pull request, #332:
URL: https://github.com/apache/arrow-dotnet/pull/332


   ## What's Changed
   
   Implements the Parquet variant shredding spec end-to-end in a new 
`Apache.Arrow.Operations.Shredding` namespace, alongside minor changes to the 
base scalar and array types.
   
   Operations.Shredding reader side:
   - `ShreddedVariant` / `ShreddedObject` / `ShreddedArray` ref-struct trio 
exposing typed columns and residual bytes side-by-side.
   - `VariantArrayShreddingExtensions` adds `GetShreddedVariant(i)` and 
`GetLogicalVariantValue(i)` on `VariantArray`.
   - `ShredSchema.FromArrowType` derives a shredding schema from an Arrow 
typed_value type, rejecting unsupported types (uint32, fixed-size-binary(N≠16)).
   
   Operations.Shredding producer side:
     - `VariantShredder` decomposes a column of `VariantValues` against a 
`ShredSchema` into shared metadata + per-row `ShredResults`.
     - `ShreddedVariantArrayBuilder` assembles those into a shredded 
`VariantArray` with a `typed_value` Arrow tree matching the schema.
   
   Apache.Arrow changes:
   - `VariantExtensionDefinition` accepts `struct<metadata, value?, 
typed_value?>` layouts in addition to the plain unshredded form.
   - `VariantType` gains `IsShredded` / `HasValueColumn` / 
`HasTypedValueColumn` / `TypedValueField` properties.
   - `VariantArray.GetVariantValue` and `GetVariantReader` throw on shredded 
columns with a pointer to the `Operations.Shredding` extensions.
   - The public `VariantArray(IArrowArray)` constructor now infers the 
`VariantType` (shredded or not) from the storage shape.
   - Operations gains a project reference to Apache.Arrow; Apache.Arrow does 
not reference Operations.
   
     Apache.Arrow.Scalars changes:
   - `VariantValueWriter.CopyValue(VariantReader source)` transcodes a reader 
into this writer, re-resolving field IDs against the writer's metadata 
dictionary. Supports cross-dictionary transcoding and multi-source 
merge-into-one-dictionary workflows.
   - `VariantMetadataBuilder.CollectFieldNames(VariantReader source)` is the 
two-pass companion that accumulates source field names into the target metadata 
builder.
   
   Validation:
   - Conformance tests run against the Iceberg shredded-variant corpus in 
`apache/parquet-testing` (`test/parquet-testing/shredded_variant/`). 
`test/shredded_variant_ipc/regen.py` converts each `case-NNN.parquet` to an 
Arrow IPC file via `pyarrow`; 137 resulting .arrow files are checked in so CI 
needs no Python. All 128 valid conformance cases pass; 6 schema-invalid and 
data-invalid cases are rejected with clear errors; 3 "spec-invalid but 
permissive" INVALID cases are documented as read-without-throw.
   - Additional round-trip, reader-style, and builder tests were implemented


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Add Parquet variant shredding support [arrow-dotnet]

Reply via email to