scovich commented on code in PR #9021:
URL: https://github.com/apache/arrow-rs/pull/9021#discussion_r2637428193


##########
arrow-json/src/reader/mod.rs:
##########
@@ -373,6 +386,95 @@ impl<R: BufRead> RecordBatchReader for Reader<R> {
     }
 }
 
+/// A trait to create custom decoders for specific data types.
+///
+/// This allows overriding the default decoders for specific data types,
+/// or adding new decoders for custom data types.
+///
+/// # Examples
+///
+/// ```
+/// use arrow_json::{ArrayDecoder, DecoderFactory, TapeElement, Tape, 
ReaderBuilder, StructMode};
+/// use arrow_schema::ArrowError;
+/// use arrow_schema::{DataType, Field, Fields, Schema};
+/// use arrow_array::cast::AsArray;
+/// use arrow_array::Array;
+/// use arrow_array::builder::StringBuilder;
+/// use arrow_data::ArrayData;
+/// use std::sync::Arc;
+///
+/// struct IncorrectStringAsNullDecoder {}
+///
+/// impl ArrayDecoder for IncorrectStringAsNullDecoder {
+///     fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> 
Result<ArrayData, ArrowError> {
+///         let mut builder = StringBuilder::new();
+///         for p in pos {
+///             match tape.get(*p) {
+///                 TapeElement::String(idx) => {
+///                     builder.append_value(tape.get_string(idx));
+///                 }
+///                 _ => builder.append_null(),
+///             }
+///         }
+///         Ok(builder.finish().into_data())
+///     }
+/// }
+///
+/// #[derive(Debug)]
+/// struct IncorrectStringAsNullDecoderFactory;
+///
+/// impl DecoderFactory for IncorrectStringAsNullDecoderFactory {
+///     fn make_default_decoder<'a>(
+///         &self,
+///         _field: Option<FieldRef>,
+///         data_type: DataType,
+///         _coerce_primitive: bool,
+///         _strict_mode: bool,
+///         _is_nullable: bool,
+///         _struct_mode: StructMode,
+///     ) -> Result<Option<Box<dyn ArrayDecoder>>, ArrowError> {
+///         match data_type {
+///             DataType::Utf8 => 
Ok(Some(Box::new(IncorrectStringAsNullDecoder {}))),
+///             _ => Ok(None),
+///         }
+///     }
+/// }
+///
+/// let json = r#"
+/// {"a": "a"}
+/// {"a": 12}
+/// "#;
+/// let batch = 
ReaderBuilder::new(Arc::new(Schema::new(Fields::from(vec![Field::new(
+///     "a",
+///     DataType::Utf8,
+///     true,
+/// )]))))
+/// .with_decoder_factory(Arc::new(IncorrectStringAsNullDecoderFactory))
+/// .build(json.as_bytes())
+/// .unwrap()
+/// .next()
+/// .unwrap()
+/// .unwrap();
+///
+/// let values = batch.column(0).as_string::<i32>();
+/// assert_eq!(values.len(), 2);
+/// assert_eq!(values.value(0), "a");
+/// assert!(values.is_null(1));
+/// ```
+pub trait DecoderFactory: std::fmt::Debug + Send + Sync {
+    /// Make a decoder that overrides the default decoder for a specific data 
type.
+    /// This can be used to override how e.g. error in decoding are handled.
+    fn make_default_decoder(

Review Comment:
   This is IMO a misleading method name:
   * The "default" encoder is whatever would have been created if no factory 
existed.
   * A factory is used to _override_ that default.
   
   So it seems like `make_decoder` or `make_custom_decoder` would be more 
descriptive?
   



##########
arrow-json/src/reader/mod.rs:
##########
@@ -686,12 +789,27 @@ macro_rules! primitive_decoder {
 }
 
 fn make_decoder(
+    field: Option<FieldRef>,
     data_type: DataType,
     coerce_primitive: bool,
     strict_mode: bool,
     is_nullable: bool,
     struct_mode: StructMode,
+    decoder_factory: Option<Arc<dyn DecoderFactory>>,
 ) -> Result<Box<dyn ArrayDecoder>, ArrowError> {
+    if let Some(ref factory) = decoder_factory {
+        if let Some(decoder) = factory.make_default_decoder(
+            field.clone(),
+            data_type.clone(),

Review Comment:
   Tiny nit: We don't _know_ yet that the factory will actually produce a 
decoder. Should the trait take `&Arc` so implementations can decide whether to 
clone them? Or is the overhead so (relatively) low compared to everything else 
involved with parsing that we don't care?



##########
parquet-variant-compute/src/decoder.rs:
##########
@@ -0,0 +1,252 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_array::{Array, StructArray};
+use arrow_json::{DecoderFactory, StructMode};
+use parquet_variant::{ObjectFieldBuilder, Variant, VariantBuilder, 
VariantBuilderExt};
+use crate::{VariantArrayBuilder, VariantType};
+use arrow_data::ArrayData;
+use arrow_schema::{ArrowError, DataType, FieldRef};
+
+use arrow_json::reader::ArrayDecoder;
+use arrow_json::reader::{Tape, TapeElement};
+
+/// An [`ArrayDecoder`] implementation that decodes JSON values into a Variant 
array.
+///
+/// This decoder converts JSON tape elements (parsed JSON tokens) into Parquet 
Variant
+/// format, preserving the full structure of arbitrary JSON including nested 
objects,
+/// arrays, and primitive types.
+///
+/// This decoder is typically used indirectly via 
[`VariantArrayDecoderFactory`] when
+/// reading JSON data into Variant columns.
+#[derive(Default)]
+pub struct VariantArrayDecoder;

Review Comment:
   I just realized! When combined with the variant shredding struct translation 
work done elsewhere -- and the full-custom decoding support I mentioned above 
-- we could define a `ShreddedVariantArrayDecoder` that turns JSON directly 
into shredded variant. That would be super slick when parsing 
mostly-strongly-typed schemas.



##########
parquet-variant-compute/src/decoder.rs:
##########
@@ -0,0 +1,252 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_array::{Array, StructArray};
+use arrow_json::{DecoderFactory, StructMode};
+use parquet_variant::{ObjectFieldBuilder, Variant, VariantBuilder, 
VariantBuilderExt};
+use crate::{VariantArrayBuilder, VariantType};
+use arrow_data::ArrayData;
+use arrow_schema::{ArrowError, DataType, FieldRef};
+
+use arrow_json::reader::ArrayDecoder;
+use arrow_json::reader::{Tape, TapeElement};
+
+/// An [`ArrayDecoder`] implementation that decodes JSON values into a Variant 
array.
+///
+/// This decoder converts JSON tape elements (parsed JSON tokens) into Parquet 
Variant
+/// format, preserving the full structure of arbitrary JSON including nested 
objects,
+/// arrays, and primitive types.
+///
+/// This decoder is typically used indirectly via 
[`VariantArrayDecoderFactory`] when
+/// reading JSON data into Variant columns.
+#[derive(Default)]
+pub struct VariantArrayDecoder;
+
+impl ArrayDecoder for VariantArrayDecoder {
+    fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> Result<ArrayData, 
ArrowError> {
+        let mut array_builder = VariantArrayBuilder::new(pos.len());
+        for p in pos {
+            let mut builder = VariantBuilder::new();
+            variant_from_tape_element(&mut builder, *p, tape)?;
+            let (metadata, value) = builder.finish();
+            array_builder.append_value(Variant::new(&metadata, &value));
+        }
+        let variant_struct_array: StructArray = array_builder.build().into();

Review Comment:
   nit: If type annotation is anyway needed, consider From instead of Into:
   
   ```suggestion
           let variant_struct_array = StructArray::from(array_builder.build());
   ```



##########
parquet-variant-compute/src/decoder.rs:
##########
@@ -0,0 +1,252 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_array::{Array, StructArray};
+use arrow_json::{DecoderFactory, StructMode};
+use parquet_variant::{ObjectFieldBuilder, Variant, VariantBuilder, 
VariantBuilderExt};
+use crate::{VariantArrayBuilder, VariantType};
+use arrow_data::ArrayData;
+use arrow_schema::{ArrowError, DataType, FieldRef};
+
+use arrow_json::reader::ArrayDecoder;
+use arrow_json::reader::{Tape, TapeElement};
+
+/// An [`ArrayDecoder`] implementation that decodes JSON values into a Variant 
array.
+///
+/// This decoder converts JSON tape elements (parsed JSON tokens) into Parquet 
Variant
+/// format, preserving the full structure of arbitrary JSON including nested 
objects,
+/// arrays, and primitive types.
+///
+/// This decoder is typically used indirectly via 
[`VariantArrayDecoderFactory`] when
+/// reading JSON data into Variant columns.
+#[derive(Default)]
+pub struct VariantArrayDecoder;
+
+impl ArrayDecoder for VariantArrayDecoder {
+    fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> Result<ArrayData, 
ArrowError> {
+        let mut array_builder = VariantArrayBuilder::new(pos.len());
+        for p in pos {
+            let mut builder = VariantBuilder::new();
+            variant_from_tape_element(&mut builder, *p, tape)?;
+            let (metadata, value) = builder.finish();
+            array_builder.append_value(Variant::new(&metadata, &value));
+        }
+        let variant_struct_array: StructArray = array_builder.build().into();
+        Ok(variant_struct_array.into_data())
+    }
+}
+
+/// A [`DecoderFactory`] that creates [`VariantArrayDecoder`] instances for 
Variant-typed fields.
+///
+/// This factory integrates with the Arrow JSON reader to automatically decode 
JSON values
+/// into Variant arrays when the target field is registered as a 
[`VariantType`] extension type.
+///
+/// # Example
+///
+/// ```ignore
+/// use arrow_json::reader::ReaderBuilder;
+/// use arrow_json::StructMode;
+/// use std::sync::Arc;
+///
+/// let builder = ReaderBuilder::new(Arc::new(schema));
+/// let reader = builder
+///     .with_struct_mode(StructMode::ObjectOnly)
+///     .with_decoder_factory(Arc::new(VariantArrayDecoderFactory))
+///     .build(json_input)?;
+/// ```
+#[derive(Default, Debug)]
+#[allow(unused)]
+pub struct VariantArrayDecoderFactory;
+
+impl DecoderFactory for VariantArrayDecoderFactory {
+    fn make_default_decoder<'a>(&self, field: Option<FieldRef>, 
+        _data_type: DataType,
+        _coerce_primitive: bool,
+        _strict_mode: bool,
+        _is_nullable: bool,
+        _struct_mode: StructMode,
+    ) -> Result<Option<Box<dyn ArrayDecoder>>, ArrowError> {
+        if let Some(field) = field && 
field.try_extension_type::<VariantType>().is_ok() {
+            return Ok(Some(Box::new(VariantArrayDecoder)));
+        }
+       Ok(None)
+    }
+}
+
+fn variant_from_tape_element(builder: &mut impl VariantBuilderExt, mut p: u32, 
tape: &Tape) -> Result<u32, ArrowError> {
+    match tape.get(p) {
+        TapeElement::StartObject(end_idx) => {
+            let mut object_builder = builder.try_new_object()?;
+            p += 1;
+            while p < end_idx {
+                // Read field name
+                let field_name = match tape.get(p) {
+                    TapeElement::String(s) => tape.get_string(s),
+                    _ => return Err(tape.error(p, "field name")),
+                };
+        
+                let mut field_builder = ObjectFieldBuilder::new(field_name, 
&mut object_builder);
+                p = tape.next(p, "field value")?;
+                p = variant_from_tape_element(&mut field_builder, p, tape)?;
+            }
+            object_builder.finish();
+        }
+        TapeElement::EndObject(_u32) => { return 
Err(ArrowError::JsonError("unexpected end of object".to_string())) },

Review Comment:
   I'm pretty sure fmt doesn't ever produce this pattern. Is CI not catching it 
somehow?
   (I saw some long lines earlier as well)



##########
parquet-variant-compute/src/decoder.rs:
##########
@@ -0,0 +1,252 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_array::{Array, StructArray};
+use arrow_json::{DecoderFactory, StructMode};
+use parquet_variant::{ObjectFieldBuilder, Variant, VariantBuilder, 
VariantBuilderExt};
+use crate::{VariantArrayBuilder, VariantType};
+use arrow_data::ArrayData;
+use arrow_schema::{ArrowError, DataType, FieldRef};
+
+use arrow_json::reader::ArrayDecoder;
+use arrow_json::reader::{Tape, TapeElement};
+
+/// An [`ArrayDecoder`] implementation that decodes JSON values into a Variant 
array.
+///
+/// This decoder converts JSON tape elements (parsed JSON tokens) into Parquet 
Variant
+/// format, preserving the full structure of arbitrary JSON including nested 
objects,
+/// arrays, and primitive types.
+///
+/// This decoder is typically used indirectly via 
[`VariantArrayDecoderFactory`] when
+/// reading JSON data into Variant columns.
+#[derive(Default)]
+pub struct VariantArrayDecoder;
+
+impl ArrayDecoder for VariantArrayDecoder {
+    fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> Result<ArrayData, 
ArrowError> {
+        let mut array_builder = VariantArrayBuilder::new(pos.len());
+        for p in pos {
+            let mut builder = VariantBuilder::new();
+            variant_from_tape_element(&mut builder, *p, tape)?;
+            let (metadata, value) = builder.finish();
+            array_builder.append_value(Variant::new(&metadata, &value));
+        }
+        let variant_struct_array: StructArray = array_builder.build().into();
+        Ok(variant_struct_array.into_data())
+    }
+}
+
+/// A [`DecoderFactory`] that creates [`VariantArrayDecoder`] instances for 
Variant-typed fields.
+///
+/// This factory integrates with the Arrow JSON reader to automatically decode 
JSON values
+/// into Variant arrays when the target field is registered as a 
[`VariantType`] extension type.
+///
+/// # Example
+///
+/// ```ignore
+/// use arrow_json::reader::ReaderBuilder;
+/// use arrow_json::StructMode;
+/// use std::sync::Arc;
+///
+/// let builder = ReaderBuilder::new(Arc::new(schema));
+/// let reader = builder
+///     .with_struct_mode(StructMode::ObjectOnly)
+///     .with_decoder_factory(Arc::new(VariantArrayDecoderFactory))
+///     .build(json_input)?;
+/// ```
+#[derive(Default, Debug)]
+#[allow(unused)]
+pub struct VariantArrayDecoderFactory;
+
+impl DecoderFactory for VariantArrayDecoderFactory {
+    fn make_default_decoder<'a>(&self, field: Option<FieldRef>, 
+        _data_type: DataType,
+        _coerce_primitive: bool,
+        _strict_mode: bool,
+        _is_nullable: bool,
+        _struct_mode: StructMode,
+    ) -> Result<Option<Box<dyn ArrayDecoder>>, ArrowError> {
+        if let Some(field) = field && 
field.try_extension_type::<VariantType>().is_ok() {

Review Comment:
   Meanwhile tho -- wouldn't it be nicer to have a boolean-returning compat 
checker on extension types, that doesn't require allocating the actual 
extension type?



##########
arrow-json/src/reader/mod.rs:
##########
@@ -373,6 +386,95 @@ impl<R: BufRead> RecordBatchReader for Reader<R> {
     }
 }
 
+/// A trait to create custom decoders for specific data types.
+///
+/// This allows overriding the default decoders for specific data types,
+/// or adding new decoders for custom data types.
+///
+/// # Examples
+///
+/// ```
+/// use arrow_json::{ArrayDecoder, DecoderFactory, TapeElement, Tape, 
ReaderBuilder, StructMode};
+/// use arrow_schema::ArrowError;
+/// use arrow_schema::{DataType, Field, Fields, Schema};
+/// use arrow_array::cast::AsArray;
+/// use arrow_array::Array;
+/// use arrow_array::builder::StringBuilder;
+/// use arrow_data::ArrayData;
+/// use std::sync::Arc;
+///
+/// struct IncorrectStringAsNullDecoder {}
+///
+/// impl ArrayDecoder for IncorrectStringAsNullDecoder {
+///     fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> 
Result<ArrayData, ArrowError> {
+///         let mut builder = StringBuilder::new();
+///         for p in pos {
+///             match tape.get(*p) {
+///                 TapeElement::String(idx) => {
+///                     builder.append_value(tape.get_string(idx));
+///                 }
+///                 _ => builder.append_null(),
+///             }
+///         }
+///         Ok(builder.finish().into_data())
+///     }
+/// }
+///
+/// #[derive(Debug)]
+/// struct IncorrectStringAsNullDecoderFactory;
+///
+/// impl DecoderFactory for IncorrectStringAsNullDecoderFactory {
+///     fn make_default_decoder<'a>(
+///         &self,
+///         _field: Option<FieldRef>,
+///         data_type: DataType,
+///         _coerce_primitive: bool,
+///         _strict_mode: bool,
+///         _is_nullable: bool,
+///         _struct_mode: StructMode,
+///     ) -> Result<Option<Box<dyn ArrayDecoder>>, ArrowError> {
+///         match data_type {
+///             DataType::Utf8 => 
Ok(Some(Box::new(IncorrectStringAsNullDecoder {}))),
+///             _ => Ok(None),
+///         }
+///     }
+/// }
+///
+/// let json = r#"
+/// {"a": "a"}
+/// {"a": 12}
+/// "#;
+/// let batch = 
ReaderBuilder::new(Arc::new(Schema::new(Fields::from(vec![Field::new(
+///     "a",
+///     DataType::Utf8,
+///     true,
+/// )]))))
+/// .with_decoder_factory(Arc::new(IncorrectStringAsNullDecoderFactory))
+/// .build(json.as_bytes())
+/// .unwrap()
+/// .next()
+/// .unwrap()
+/// .unwrap();
+///
+/// let values = batch.column(0).as_string::<i32>();
+/// assert_eq!(values.len(), 2);
+/// assert_eq!(values.value(0), "a");
+/// assert!(values.is_null(1));
+/// ```
+pub trait DecoderFactory: std::fmt::Debug + Send + Sync {

Review Comment:
   If I am understanding correctly here, one could _almost_ use this framework 
to set up a fully customizable decoding. But not quite.
   
   The decoder factory could intercept the top-level struct of the input schema 
and assign a custom decoder to every field based on path and/or type. That 
could be really slick for custom error handling, for example turning all 
incompatible inputs to NULL instead of producing errors (potentially only for a 
subset of known-funky fields while leaving "reliable" fields fully strongly and 
strictly typed). Problem is, if the decoder factory intercepts a complex type, 
it _has_ to provide a custom decoder for every field of that struct (and wrap 
them all in a custom complex type decoder). Ouch. 
   
   I don't know that we would want to make all the internal decoders public -- 
they are currently `pub` members of private modules -- but is there perhaps a 
way to give decoder factories access to some functionality like `make_decoder`, 
even if only implicitly? A way to say "do whatever you normally would have done 
here"? 
   
   For primitives, it might be as simple as defining a `DefaultDecoderFactory` 
that forwards blindly to `make_decoder`. 
   
   But struct/map/list would need something more, because the decoder factory 
for a complex type would need to assemble the custom sub-decoders of that 
type's children (if not defaulted). Maybe the hypothetical 
`DefaultDecoderFactory` would have associated `make_xxx_decoder` functions 
which are just validating wrappers that create the appropriate decoder directly 
from user-provided child decoders? e.g. `make_list_decoder` would take a single 
`Box<dyn ArrayDecoder>`, `make_map_decoder` would take two decoders, and 
`make_struct_decoder` would take a vec of field decoders. Alternatively, we 
could make the complex type decoders `pub` and define actual `try_new`. But 
that would increase the number of types in an asymmetric way (where are the 
primitive decoders).



##########
parquet-variant-compute/src/decoder.rs:
##########
@@ -0,0 +1,252 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_array::{Array, StructArray};
+use arrow_json::{DecoderFactory, StructMode};
+use parquet_variant::{ObjectFieldBuilder, Variant, VariantBuilder, 
VariantBuilderExt};
+use crate::{VariantArrayBuilder, VariantType};
+use arrow_data::ArrayData;
+use arrow_schema::{ArrowError, DataType, FieldRef};
+
+use arrow_json::reader::ArrayDecoder;
+use arrow_json::reader::{Tape, TapeElement};
+
+/// An [`ArrayDecoder`] implementation that decodes JSON values into a Variant 
array.
+///
+/// This decoder converts JSON tape elements (parsed JSON tokens) into Parquet 
Variant
+/// format, preserving the full structure of arbitrary JSON including nested 
objects,
+/// arrays, and primitive types.
+///
+/// This decoder is typically used indirectly via 
[`VariantArrayDecoderFactory`] when
+/// reading JSON data into Variant columns.
+#[derive(Default)]
+pub struct VariantArrayDecoder;
+
+impl ArrayDecoder for VariantArrayDecoder {
+    fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> Result<ArrayData, 
ArrowError> {
+        let mut array_builder = VariantArrayBuilder::new(pos.len());
+        for p in pos {
+            let mut builder = VariantBuilder::new();
+            variant_from_tape_element(&mut builder, *p, tape)?;
+            let (metadata, value) = builder.finish();
+            array_builder.append_value(Variant::new(&metadata, &value));
+        }
+        let variant_struct_array: StructArray = array_builder.build().into();
+        Ok(variant_struct_array.into_data())
+    }
+}
+
+/// A [`DecoderFactory`] that creates [`VariantArrayDecoder`] instances for 
Variant-typed fields.
+///
+/// This factory integrates with the Arrow JSON reader to automatically decode 
JSON values
+/// into Variant arrays when the target field is registered as a 
[`VariantType`] extension type.
+///
+/// # Example
+///
+/// ```ignore
+/// use arrow_json::reader::ReaderBuilder;
+/// use arrow_json::StructMode;
+/// use std::sync::Arc;
+///
+/// let builder = ReaderBuilder::new(Arc::new(schema));
+/// let reader = builder
+///     .with_struct_mode(StructMode::ObjectOnly)
+///     .with_decoder_factory(Arc::new(VariantArrayDecoderFactory))
+///     .build(json_input)?;
+/// ```
+#[derive(Default, Debug)]
+#[allow(unused)]
+pub struct VariantArrayDecoderFactory;
+
+impl DecoderFactory for VariantArrayDecoderFactory {
+    fn make_default_decoder<'a>(&self, field: Option<FieldRef>, 
+        _data_type: DataType,
+        _coerce_primitive: bool,
+        _strict_mode: bool,
+        _is_nullable: bool,
+        _struct_mode: StructMode,
+    ) -> Result<Option<Box<dyn ArrayDecoder>>, ArrowError> {
+        if let Some(field) = field && 
field.try_extension_type::<VariantType>().is_ok() {

Review Comment:
   if-let chaining would bump MSRV, no?
   (if MSRV already bumped... yay!)



##########
parquet-variant-compute/src/decoder.rs:
##########
@@ -0,0 +1,252 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_array::{Array, StructArray};
+use arrow_json::{DecoderFactory, StructMode};
+use parquet_variant::{ObjectFieldBuilder, Variant, VariantBuilder, 
VariantBuilderExt};
+use crate::{VariantArrayBuilder, VariantType};
+use arrow_data::ArrayData;
+use arrow_schema::{ArrowError, DataType, FieldRef};
+
+use arrow_json::reader::ArrayDecoder;
+use arrow_json::reader::{Tape, TapeElement};
+
+/// An [`ArrayDecoder`] implementation that decodes JSON values into a Variant 
array.
+///
+/// This decoder converts JSON tape elements (parsed JSON tokens) into Parquet 
Variant
+/// format, preserving the full structure of arbitrary JSON including nested 
objects,
+/// arrays, and primitive types.
+///
+/// This decoder is typically used indirectly via 
[`VariantArrayDecoderFactory`] when
+/// reading JSON data into Variant columns.
+#[derive(Default)]
+pub struct VariantArrayDecoder;
+
+impl ArrayDecoder for VariantArrayDecoder {
+    fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> Result<ArrayData, 
ArrowError> {
+        let mut array_builder = VariantArrayBuilder::new(pos.len());
+        for p in pos {
+            let mut builder = VariantBuilder::new();
+            variant_from_tape_element(&mut builder, *p, tape)?;
+            let (metadata, value) = builder.finish();
+            array_builder.append_value(Variant::new(&metadata, &value));

Review Comment:
   `VariantArrayBuilder` implements `VariantBuilderExt` API. You should be able 
to just pass it to `variant_from_tape_element`. Avoids the intermediate vec 
allocations and validation this loop currently pays.
   ```suggestion
               variant_from_tape_element(&mut array_builder, *p, tape)?;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to