scovich commented on code in PR #9681:
URL: https://github.com/apache/arrow-rs/pull/9681#discussion_r3382818159


##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -250,6 +266,18 @@ fn shredded_get_path(
             }
             ShreddedPathStep::Missing => {
                 let num_rows = input.len();
+                if 
as_field.is_some_and(Field::has_valid_extension_type::<VariantType>) {
+                    let all_nulls = 
Some(arrow::buffer::NullBuffer::from(vec![false; num_rows]));
+                    let arr = VariantArray::from_parts(

Review Comment:
   Nit: Could save a line by
   ```rust
   ArrayRef::from(VariantArray::from_parts(
     ...
   ))
   ```
   ... but I didn't think `VariantArray` implemented `Array` (it cannot safely 
do so, lacking a `DataType::Variant` to pair with), so how can it be converted 
to `ArrayRef` at all?



##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -250,6 +266,18 @@ fn shredded_get_path(
             }
             ShreddedPathStep::Missing => {
                 let num_rows = input.len();
+                if 
as_field.is_some_and(Field::has_valid_extension_type::<VariantType>) {
+                    let all_nulls = 
Some(arrow::buffer::NullBuffer::from(vec![false; num_rows]));
+                    let arr = VariantArray::from_parts(
+                        // Propagating metadata is not necessary for an 
all-NULL array, but is cheaper than constructing
+                        // a new empty metadata array. (n * 3 bytes vs Arc 
bump)
+                        input.metadata_field().clone(),
+                        None,
+                        None,
+                        all_nulls,
+                    );

Review Comment:
   ```suggestion
                       // Propagating metadata is not necessary for an all-NULL 
array, but is cheaper than constructing
                       // a new empty metadata array. (n * 3 bytes vs Arc bump)
                       let metadata = input.metadata_field().clone();
                       Ok(ArrayRef::from(VariantArray::from_parts(metadata, 
None, None, all_nulls)))
   ```



##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -250,6 +266,18 @@ fn shredded_get_path(
             }
             ShreddedPathStep::Missing => {
                 let num_rows = input.len();
+                if 
as_field.is_some_and(Field::has_valid_extension_type::<VariantType>) {
+                    let all_nulls = 
Some(arrow::buffer::NullBuffer::from(vec![false; num_rows]));
+                    let arr = VariantArray::from_parts(

Review Comment:
   Ah! We have an `impl From<VariantArray> for ArrayRef` that returns the inner 
`StructArray`.



##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -204,7 +204,23 @@ fn shredded_get_path(
     // Helper that shreds a VariantArray to a specific type.
     let shred_basic_variant =
         |target: VariantArray, path: VariantPath<'_>, as_field: 
Option<&Field>| {

Review Comment:
   This helper just became a lot more complicated, could use some inline code 
comments explaining how this new `requested_variant` stuff impacts the control 
flow?
   
   e.g. "if the caller requested variant, unshred to binary it before 
continuing; if they also provided a path to traverse, don't pass a type so that 
the requested field can be fetched directly."
   
   But that opens a bunch of questions:
   1. Why unshred the whole thing if the user provided a non-empty `path`? 
Wouldn't it be better to any shredding as deeply as possible first? Or did the 
caller of `shred_basic_variant` already do that?
   2. This code seems to ignore any shredded variant schema the user might have 
provided. I get that we don't necessarily want to tackle variant-get with 
custom shredding in this PR, but we should at least block it with an error 
instead of silently returning something different than was requested?



##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -293,30 +321,30 @@ fn shredded_get_path(
     //
     // For shredded/partially-shredded targets (`typed_value` present), 
recurse into each field
     // separately to take advantage of deeper shredding in child fields.
-    if let DataType::Struct(fields) = as_field.data_type() {
-        if target.typed_value_field().is_none() {
-            return shred_basic_variant(target, VariantPath::default(), 
Some(as_field));
-        }
-
-        let children = fields
-            .iter()
-            .map(|field| {
-                shredded_get_path(
-                    &target,
-                    &[VariantPathElement::from(field.name().as_str())],
-                    Some(field),
-                    cast_options,
-                )
-            })
-            .collect::<Result<Vec<_>>>()?;
-
-        let struct_nulls = target.nulls().cloned();
+    if !as_field.has_valid_extension_type::<VariantType>() {
+        if let DataType::Struct(fields) = as_field.data_type() {
+            if target.typed_value_field().is_none() {
+                return shred_basic_variant(target, VariantPath::default(), 
Some(as_field));
+            }
 
-        return Ok(Arc::new(StructArray::try_new(
-            fields.clone(),
-            children,
-            struct_nulls,
-        )?));
+            let children = fields
+                .iter()
+                .map(|field| {
+                    shredded_get_path(
+                        &target,
+                        &[VariantPathElement::from(field.name().as_str())],
+                        Some(field),
+                        cast_options,
+                    )

Review Comment:
   nit:
   ```suggestion
                       let path = 
&[VariantPathElement::from(field.name().as_str())];
                       shredded_get_path(&target, path, Some(field), 
cast_options)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to