alamb commented on code in PR #7919:
URL: https://github.com/apache/arrow-rs/pull/7919#discussion_r2202901396
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -0,0 +1,265 @@
+use std::sync::Arc;
+
+use arrow::{
+ array::{
+ Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray,
PrimitiveBuilder,
+ StructArray,
+ },
+ compute::CastOptions,
+ datatypes::UInt64Type,
+ error::Result,
+};
+use arrow_schema::{ArrowError, DataType, Field};
+use parquet_variant::Variant;
+
+use crate::utils::variant_from_struct_array;
+
+/// Returns an array with the specified path extracted from the variant values.
+pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> {
+ let (struct_array, metadata_array, value_array) =
variant_from_struct_array(input)?;
+
+ // TODO: can we use OffsetBuffer and NullBuffer here instead?
+ // I couldn't find a way to set individual indices here, so went
with vecs.
+ let mut offsets = vec![0; struct_array.len()];
+ let mut nulls = if let Some(struct_nulls) = struct_array.nulls() {
+ struct_nulls.iter().collect()
+ } else {
+ vec![true; struct_array.len()]
+ };
+
+ for path in options
+ .path
+ .0
+ .iter()
+ .take(options.path.0.len().saturating_sub(1))
+ {
+ match path {
+ VariantPathElement::Field { name } => {
+ go_to_object_field(
+ struct_array,
+ metadata_array,
+ value_array,
+ name,
+ &mut offsets,
+ &mut nulls,
+ )?;
+ }
+ VariantPathElement::Index { offset } => {
+ go_to_array_index(
+ struct_array,
+ metadata_array,
+ value_array,
+ *offset,
+ &mut offsets,
+ &mut nulls,
+ )?;
+ }
+ }
+ }
+
+ let as_type = options.as_type.ok_or_else(|| {
Review Comment:
another way to potentially simplify / break this PR up into smaller parts
would be to first focus on a `variant_get` that only supports returning the
requested field as another VariantArray.
We could then work out how to support casting to primitive types after that.
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -0,0 +1,265 @@
+use std::sync::Arc;
+
+use arrow::{
+ array::{
+ Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray,
PrimitiveBuilder,
+ StructArray,
+ },
+ compute::CastOptions,
+ datatypes::UInt64Type,
+ error::Result,
+};
+use arrow_schema::{ArrowError, DataType, Field};
+use parquet_variant::Variant;
+
+use crate::utils::variant_from_struct_array;
+
+/// Returns an array with the specified path extracted from the variant values.
+pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> {
Review Comment:
While trying to review this PR, one thing that I thought might help could be
to separate out the Variant traversal and the construction of the output
For example, perhaps you could implement a function to find the appropriate
sub Variant like
```rust
impl Variant {
fn get_path(&self, path: &VariantPath) -> Option<Variant> {
...
}
}
```
With that building block, you could implement the basic "extract a variant"
kernel to a new VariantArray using the newly introduced `VariantArrayBuilder`
```rust
let mut output_builder = VariantArrayBuilder::new();
// loop over each input row
for input_variant in input_variant_array.iter() {
// copy the input variant to the output builder
let mut vb = VariantBuilder::new()
vb.append(input_variant)
let (metadata, value) = vb.build();
output_builder.append_buffers()
}
```
I think once we get https://github.com/apache/arrow-rs/pull/7914 from
@friendlymatthew in that should work
One downside of this approach is that it will copy the variant over and if
the source and destinations are both BinaryViewArray we can probably be much
more clever
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -0,0 +1,265 @@
+use std::sync::Arc;
+
+use arrow::{
+ array::{
+ Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray,
PrimitiveBuilder,
+ StructArray,
+ },
+ compute::CastOptions,
+ datatypes::UInt64Type,
+ error::Result,
+};
+use arrow_schema::{ArrowError, DataType, Field};
+use parquet_variant::Variant;
+
+use crate::utils::variant_from_struct_array;
+
+/// Returns an array with the specified path extracted from the variant values.
+pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> {
+ let (struct_array, metadata_array, value_array) =
variant_from_struct_array(input)?;
+
+ // TODO: can we use OffsetBuffer and NullBuffer here instead?
+ // I couldn't find a way to set individual indices here, so went
with vecs.
+ let mut offsets = vec![0; struct_array.len()];
+ let mut nulls = if let Some(struct_nulls) = struct_array.nulls() {
+ struct_nulls.iter().collect()
+ } else {
+ vec![true; struct_array.len()]
+ };
+
+ for path in options
Review Comment:
As I mentioned above I think it might be easier to implement this function
as a method on variant itself
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]