mneetika opened a new pull request, #10014:
URL: https://github.com/apache/arrow-rs/pull/10014

   # Which issue does this PR close?
   
   - Closes #10013
   - Related to #6736
   
   # Rationale for this change
   
   The current `unshred_variant` layout restoration engine lacks native support 
for complex nested and run-length arrays (`DataType::Dictionary` and 
`DataType::RunEndEncoded`). Without this capabilities track, downstream 
analytical engines (such as Apache DataFusion) cannot fully execute end-to-end 
optimizations on semi-structured Parquet Variant columns when columns use 
memory-optimized dictionaries or compressed run layouts. 
   
   Rather than writing fragmented, redundant iteration code specific to these 
two complex types, this PR closes the structural type gap by cleanly routing 
layout handling through the pre-existing, highly efficient 
`ArrowToVariantRowBuilder` abstraction framework.
   
   # What changes are included in this PR?
   
   1. **Extended `UnshredVariantRowBuilder` State Machine**: Added an 
`Arrow(ArrowUnshredRowBuilder)` variant to handle layout schemas that are 
natively supported by Arrow-to-Variant records but do not possess localized 
primitive variant implementations.
   2. **Plumbed `CastOptions` Recursively**: Propagated `CastOptions` 
configuration down through the recursive `try_new_opt` lifecycle. This 
guarantees that type conversions remain perfectly aligned across highly nested 
structs, lists, and view layers.
   3. **Optimized Null and Fallback Interception**: Hooked the new row builder 
directly into the core `handle_unshredded_case!` macro routine to maintain 
low-overhead tracking for early row nulls or literal byte vector overrides.
   
   # Are these changes tested?
   
   Yes, automated unit tests have been added directly to `unshred_variant.rs` 
to guarantee complete encoding and decoding fidelity:
   - `test_unshred_dictionary_typed_value`: Validates dictionary key-to-value 
resolution paths, null key handoffs, and index repetition offsets.
   - `test_unshred_run_end_encoded_typed_value`: Verifies run-length array 
boundary checks and multi-row string value reconstruction.
   
   # Are there any user-facing changes?
   
   No. This change is entirely additive and non-breaking. It expands structural 
coverage for the experimental Parquet Variant processing pipeline without 
modifying any public-facing function signatures or traits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to