scovich commented on code in PR #9681:
URL: https://github.com/apache/arrow-rs/pull/9681#discussion_r3382818159
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -250,6 +266,18 @@ fn shredded_get_path(
}
ShreddedPathStep::Missing => {
let num_rows = input.len();
+ if
as_field.is_some_and(Field::has_valid_extension_type::<VariantType>) {
+ let all_nulls =
Some(arrow::buffer::NullBuffer::from(vec![false; num_rows]));
+ let arr = VariantArray::from_parts(
Review Comment:
Nit: Could save a line by
```rust
ArrayRef::from(VariantArray::from_parts(
...
))
```
... but I didn't think `VariantArray` implemented `Array` (it cannot safely
do so, lacking a `DataType::Variant` to pair with), so how can it be converted
to `ArrayRef` at all?
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -250,6 +266,18 @@ fn shredded_get_path(
}
ShreddedPathStep::Missing => {
let num_rows = input.len();
+ if
as_field.is_some_and(Field::has_valid_extension_type::<VariantType>) {
+ let all_nulls =
Some(arrow::buffer::NullBuffer::from(vec![false; num_rows]));
+ let arr = VariantArray::from_parts(
+ // Propagating metadata is not necessary for an
all-NULL array, but is cheaper than constructing
+ // a new empty metadata array. (n * 3 bytes vs Arc
bump)
+ input.metadata_field().clone(),
+ None,
+ None,
+ all_nulls,
+ );
Review Comment:
```suggestion
// Propagating metadata is not necessary for an all-NULL
array, but is cheaper than constructing
// a new empty metadata array. (n * 3 bytes vs Arc bump)
let metadata = input.metadata_field().clone();
Ok(ArrayRef::from(VariantArray::from_parts(metadata,
None, None, all_nulls)))
```
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -250,6 +266,18 @@ fn shredded_get_path(
}
ShreddedPathStep::Missing => {
let num_rows = input.len();
+ if
as_field.is_some_and(Field::has_valid_extension_type::<VariantType>) {
+ let all_nulls =
Some(arrow::buffer::NullBuffer::from(vec![false; num_rows]));
+ let arr = VariantArray::from_parts(
Review Comment:
Ah! We have an `impl From<VariantArray> for ArrayRef` that returns the inner
`StructArray`.
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -204,7 +204,23 @@ fn shredded_get_path(
// Helper that shreds a VariantArray to a specific type.
let shred_basic_variant =
|target: VariantArray, path: VariantPath<'_>, as_field:
Option<&Field>| {
Review Comment:
This helper just became a lot more complicated, could use some inline code
comments explaining how this new `requested_variant` stuff impacts the control
flow?
e.g. "if the caller requested variant, unshred to binary it before
continuing; if they also provided a path to traverse, don't pass a type so that
the requested field can be fetched directly."
But that opens a bunch of questions:
1. Why unshred the whole thing if the user provided a non-empty `path`?
Wouldn't it be better to any shredding as deeply as possible first? Or did the
caller of `shred_basic_variant` already do that?
2. This code seems to ignore any shredded variant schema the user might have
provided. I get that we don't necessarily want to tackle variant-get with
custom shredding in this PR, but we should at least block it with an error
instead of silently returning something different than was requested?
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -293,30 +321,30 @@ fn shredded_get_path(
//
// For shredded/partially-shredded targets (`typed_value` present),
recurse into each field
// separately to take advantage of deeper shredding in child fields.
- if let DataType::Struct(fields) = as_field.data_type() {
- if target.typed_value_field().is_none() {
- return shred_basic_variant(target, VariantPath::default(),
Some(as_field));
- }
-
- let children = fields
- .iter()
- .map(|field| {
- shredded_get_path(
- &target,
- &[VariantPathElement::from(field.name().as_str())],
- Some(field),
- cast_options,
- )
- })
- .collect::<Result<Vec<_>>>()?;
-
- let struct_nulls = target.nulls().cloned();
+ if !as_field.has_valid_extension_type::<VariantType>() {
+ if let DataType::Struct(fields) = as_field.data_type() {
+ if target.typed_value_field().is_none() {
+ return shred_basic_variant(target, VariantPath::default(),
Some(as_field));
+ }
- return Ok(Arc::new(StructArray::try_new(
- fields.clone(),
- children,
- struct_nulls,
- )?));
+ let children = fields
+ .iter()
+ .map(|field| {
+ shredded_get_path(
+ &target,
+ &[VariantPathElement::from(field.name().as_str())],
+ Some(field),
+ cast_options,
+ )
Review Comment:
nit:
```suggestion
let path =
&[VariantPathElement::from(field.name().as_str())];
shredded_get_path(&target, path, Some(field),
cast_options)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]