scovich commented on code in PR #8521:
URL: https://github.com/apache/arrow-rs/pull/8521#discussion_r2396089738
##########
parquet-variant-compute/src/arrow_to_variant.rs:
##########
@@ -38,6 +38,98 @@ use parquet_variant::{
use std::collections::HashMap;
use std::ops::Range;
+// ============================================================================
+// Shared traits and helpers for Arrow-to-Variant conversion
+// ============================================================================
+
+/// Zero-cost trait for converting Arrow array values to Variant
+pub(crate) trait ArrowToVariant: Array {
+ fn append_to_variant_builder(
+ &self,
+ builder: &mut impl VariantBuilderExt,
+ index: usize,
+ ) -> Result<(), ArrowError>;
+}
+
+/// Macro to define ArrowToVariant implementations with optional value
transformation
+macro_rules! define_arrow_to_variant {
Review Comment:
Here's an LLM-generated compare/contrast, in case that's helpful:
<details>
## Analysis of Current Consolidation Effort
### Current State
The diff shows an attempt to consolidate primitive type handling by:
1. **Introducing shared `ArrowToVariant` trait** - A zero-cost trait that
both modules can use
2. **Adding `define_arrow_to_variant` macro** - A simpler macro for basic
type conversions
3. **Sharing timestamp conversion logic** - `shared_timestamp_to_variant`
function
4. **Partial adoption in unshred_variant.rs** - Using the shared trait for
most primitive types
### Key Differences Between Approaches
#### **arrow_to_variant.rs (Cast semantics)**
- **Philosophy**: Convert any Arrow type to Variant, with flexible error
handling
- **Macro**: `define_row_builder!` - Complex, feature-rich macro supporting:
- Optional extra fields (like `CastOptions`, `scale`)
- Fallible transformations with `Option<T>` return types
- Strict vs non-strict error handling modes
- Generic type parameters and where clauses
- **Primitive definition**: `T::Native: Into<Variant>` constraint
- **Decimal support**: Full support for all decimal types with overflow
handling
- **FixedSizeBinary**: Accepts any size, converts to `Variant::Binary`
#### **unshred_variant.rs (Unshred semantics)**
- **Philosophy**: Reconstruct original Variant from shredded representation
- **Macro**: `define_arrow_to_variant!` - Simpler macro supporting:
- Basic value transformations
- Simple error propagation
- No extra configuration fields
- **Primitive definition**: Any type implementing `ArrowToVariant` trait
- **Decimal support**: Missing (not implemented)
- **FixedSizeBinary**: Only size 16 allowed, converts to `Variant::Uuid`
### Areas of Redundancy
#### 1. **Primitive Type Implementations**
Both modules have nearly identical logic for:
- Basic integer types (i8, i16, i32, i64, u8, u16, u32, u64)
- Floating point types (f32, f64, f16)
- Boolean, String, BinaryView
- Date32, Time64Microsecond
- Timestamp types (with timezone handling)
#### 2. **Enum Variants and Match Arms**
Both have large enums with similar variants:
```rust
// arrow_to_variant.rs
PrimitiveInt8(PrimitiveArrowToVariantBuilder<'a, Int8Type>),
PrimitiveInt16(PrimitiveArrowToVariantBuilder<'a, Int16Type>),
// ... 12+ more primitive variants
// unshred_variant.rs
PrimitiveInt8(UnshredPrimitiveRowBuilder<'a, PrimitiveArray<Int8Type>>),
PrimitiveInt16(UnshredPrimitiveRowBuilder<'a, PrimitiveArray<Int16Type>>),
// ... 10+ more primitive variants
```
#### 3. **Factory Pattern Logic**
Both have similar DataType matching logic to create appropriate builders.
### Semantic Differences That Prevent Full Sharing
1. **FixedSizeBinary Handling**:
- Cast: Any size → `Variant::Binary`
- Unshred: Only size 16 → `Variant::Uuid`
2. **Error Handling Philosophy**:
- Cast: Configurable strict/non-strict modes, overflow becomes
`Variant::Null`
- Unshred: Strict validation, errors propagate up
3. **Decimal Support**:
- Cast: Full decimal support with scale handling and overflow detection
- Unshred: No decimal support (missing data point)
4. **Method Signatures**:
- Cast: `append_row(builder, index)`
- Unshred: `append_row(builder, metadata, index)` - needs metadata for
unshredded values
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]