mneetika opened a new pull request, #10014: URL: https://github.com/apache/arrow-rs/pull/10014
# Which issue does this PR close? - Closes #10013 - Related to #6736 # Rationale for this change The current `unshred_variant` layout restoration engine lacks native support for complex nested and run-length arrays (`DataType::Dictionary` and `DataType::RunEndEncoded`). Without this capabilities track, downstream analytical engines (such as Apache DataFusion) cannot fully execute end-to-end optimizations on semi-structured Parquet Variant columns when columns use memory-optimized dictionaries or compressed run layouts. Rather than writing fragmented, redundant iteration code specific to these two complex types, this PR closes the structural type gap by cleanly routing layout handling through the pre-existing, highly efficient `ArrowToVariantRowBuilder` abstraction framework. # What changes are included in this PR? 1. **Extended `UnshredVariantRowBuilder` State Machine**: Added an `Arrow(ArrowUnshredRowBuilder)` variant to handle layout schemas that are natively supported by Arrow-to-Variant records but do not possess localized primitive variant implementations. 2. **Plumbed `CastOptions` Recursively**: Propagated `CastOptions` configuration down through the recursive `try_new_opt` lifecycle. This guarantees that type conversions remain perfectly aligned across highly nested structs, lists, and view layers. 3. **Optimized Null and Fallback Interception**: Hooked the new row builder directly into the core `handle_unshredded_case!` macro routine to maintain low-overhead tracking for early row nulls or literal byte vector overrides. # Are these changes tested? Yes, automated unit tests have been added directly to `unshred_variant.rs` to guarantee complete encoding and decoding fidelity: - `test_unshred_dictionary_typed_value`: Validates dictionary key-to-value resolution paths, null key handoffs, and index repetition offsets. - `test_unshred_run_end_encoded_typed_value`: Verifies run-length array boundary checks and multi-row string value reconstruction. # Are there any user-facing changes? No. This change is entirely additive and non-breaking. It expands structural coverage for the experimental Parquet Variant processing pipeline without modifying any public-facing function signatures or traits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
