scovich commented on code in PR #9021:
URL: https://github.com/apache/arrow-rs/pull/9021#discussion_r2732898935


##########
arrow-json/src/reader/mod.rs:
##########
@@ -373,6 +386,95 @@ impl<R: BufRead> RecordBatchReader for Reader<R> {
     }
 }
 
+/// A trait to create custom decoders for specific data types.
+///
+/// This allows overriding the default decoders for specific data types,
+/// or adding new decoders for custom data types.
+///
+/// # Examples
+///
+/// ```
+/// use arrow_json::{ArrayDecoder, DecoderFactory, TapeElement, Tape, 
ReaderBuilder, StructMode};
+/// use arrow_schema::ArrowError;
+/// use arrow_schema::{DataType, Field, Fields, Schema};
+/// use arrow_array::cast::AsArray;
+/// use arrow_array::Array;
+/// use arrow_array::builder::StringBuilder;
+/// use arrow_data::ArrayData;
+/// use std::sync::Arc;
+///
+/// struct IncorrectStringAsNullDecoder {}
+///
+/// impl ArrayDecoder for IncorrectStringAsNullDecoder {
+///     fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> 
Result<ArrayData, ArrowError> {
+///         let mut builder = StringBuilder::new();
+///         for p in pos {
+///             match tape.get(*p) {
+///                 TapeElement::String(idx) => {
+///                     builder.append_value(tape.get_string(idx));
+///                 }
+///                 _ => builder.append_null(),
+///             }
+///         }
+///         Ok(builder.finish().into_data())
+///     }
+/// }
+///
+/// #[derive(Debug)]
+/// struct IncorrectStringAsNullDecoderFactory;
+///
+/// impl DecoderFactory for IncorrectStringAsNullDecoderFactory {
+///     fn make_default_decoder<'a>(
+///         &self,
+///         _field: Option<FieldRef>,
+///         data_type: DataType,
+///         _coerce_primitive: bool,
+///         _strict_mode: bool,
+///         _is_nullable: bool,
+///         _struct_mode: StructMode,
+///     ) -> Result<Option<Box<dyn ArrayDecoder>>, ArrowError> {
+///         match data_type {
+///             DataType::Utf8 => 
Ok(Some(Box::new(IncorrectStringAsNullDecoder {}))),
+///             _ => Ok(None),
+///         }
+///     }
+/// }
+///
+/// let json = r#"
+/// {"a": "a"}
+/// {"a": 12}
+/// "#;
+/// let batch = 
ReaderBuilder::new(Arc::new(Schema::new(Fields::from(vec![Field::new(
+///     "a",
+///     DataType::Utf8,
+///     true,
+/// )]))))
+/// .with_decoder_factory(Arc::new(IncorrectStringAsNullDecoderFactory))
+/// .build(json.as_bytes())
+/// .unwrap()
+/// .next()
+/// .unwrap()
+/// .unwrap();
+///
+/// let values = batch.column(0).as_string::<i32>();
+/// assert_eq!(values.len(), 2);
+/// assert_eq!(values.value(0), "a");
+/// assert!(values.is_null(1));
+/// ```
+pub trait DecoderFactory: std::fmt::Debug + Send + Sync {

Review Comment:
   And yes, we'd need to figure out a way to express column paths for nested 
types. We faced a similar problem in delta-kernel-rs and arrive at a 
not-terrible 
[ColumnName](https://docs.rs/delta_kernel/latest/delta_kernel/expressions/struct.ColumnName.html)
 API -- you're welcome to use that as a starting point if it's helpful?
   
   (aside: I just noticed that the helper `column_name!` macro is not showing 
up in the public docs... but is a statically verified version of 
[ColumnName::from_naive_str_split](https://docs.rs/delta_kernel/latest/delta_kernel/expressions/struct.ColumnName.html#method.from_naive_str_split)
 that accepts only string literals comprised of ASCII alphanumeric, `_`, and 
`.` characters)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to