CurtHagenlocher opened a new pull request, #328:
URL: https://github.com/apache/arrow-dotnet/pull/328
## What's Changed
Implements the Parquet variant shredding spec end-to-end in a new
`Apache.Arrow.Operations.Shredding` namespace, alongside minor changes to the
base scalar and array types.
Operations.Shredding reader side:
- `ShreddedVariant` / `ShreddedObject` / `ShreddedArray` ref-struct trio
exposing typed columns and residual bytes side-by-side.
- `VariantArrayShreddingExtensions` adds `GetShreddedVariant(i)` and
`GetLogicalVariantValue(i)` on `VariantArray`.
- `ShredSchema.FromArrowType` derives a shredding schema from an Arrow
typed_value type, rejecting unsupported types (uint32, fixed-size-binary(N≠16)).
Operations.Shredding producer side:
- `VariantShredder` decomposes a column of `VariantValues` against a
`ShredSchema` into shared metadata + per-row `ShredResults`.
- `ShreddedVariantArrayBuilder` assembles those into a shredded
`VariantArray` with a `typed_value` Arrow tree matching the schema.
Apache.Arrow changes:
- `VariantExtensionDefinition` accepts `struct<metadata, value?,
typed_value?>` layouts in addition to the plain unshredded form.
- `VariantType` gains `IsShredded` / `HasValueColumn` /
`HasTypedValueColumn` / `TypedValueField` properties.
- `VariantArray.GetVariantValue` and `GetVariantReader` throw on shredded
columns with a pointer to the `Operations.Shredding` extensions.
- The public `VariantArray(IArrowArray)` constructor now infers the
`VariantType` (shredded or not) from the storage shape.
- Operations gains a project reference to Apache.Arrow; Apache.Arrow does
not reference Operations.
Apache.Arrow.Scalars changes:
- `VariantValueWriter.CopyValue(VariantReader source)` transcodes a reader
into this writer, re-resolving field IDs against the writer's metadata
dictionary. Supports cross-dictionary transcoding and multi-source
merge-into-one-dictionary workflows.
- `VariantMetadataBuilder.CollectFieldNames(VariantReader source)` is the
two-pass companion that accumulates source field names into the target metadata
builder.
Validation:
- Conformance tests run against the Iceberg shredded-variant corpus in
`apache/parquet-testing` (`test/parquet-testing/shredded_variant/`).
`test/shredded_variant_ipc/regen.py` converts each `case-NNN.parquet` to an
Arrow IPC file via `pyarrow`; 137 resulting .arrow files are checked in so CI
needs no Python. All 128 valid conformance cases pass; 6 schema-invalid and
data-invalid cases are rejected with clear errors; 3 "spec-invalid but
permissive" INVALID cases are documented as read-without-throw.
- Additional round-trip, reader-style, and builder tests were implemented
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]