scovich commented on code in PR #9259:
URL: https://github.com/apache/arrow-rs/pull/9259#discussion_r2732947655
##########
arrow-json/src/reader/mod.rs:
##########
@@ -2815,4 +3021,67 @@ mod tests {
"Json error: whilst decoding field 'a': failed to parse \"a\" as
Int32".to_owned()
);
}
+
+ #[test]
+ fn test_decoder_factory() {
+ use arrow_array::builder;
+
+ struct AlwaysNullStringArrayDecoder;
+
+ impl ArrayDecoder for AlwaysNullStringArrayDecoder {
+ fn decode(&mut self, _tape: &Tape<'_>, pos: &[u32]) ->
Result<ArrayData, ArrowError> {
+ let mut builder = builder::StringBuilder::new();
+ for _ in pos {
+ builder.append_null();
+ }
+ Ok(builder.finish().into_data())
+ }
+ }
+
+ #[derive(Debug)]
+ struct AlwaysNullStringArrayDecoderFactory;
+
+ impl DecoderFactory for AlwaysNullStringArrayDecoderFactory {
+ fn make_custom_decoder(
+ &self,
Review Comment:
After playing around in this PR, I think we need both path-based and
type-based factories:
* Type-based -- for extension types, generally lenient parsing preferences,
etc. The behavior of such decoders isn't related to its location in the schema,
and expressing them path-based would require traversing the schema to find them
would be annoying.
* Path-based -- for "quirks-mode" parsing of specific badly-bahaved fields,
flexible parsing of free-form fields while leaving the rest of the schema
strongly typed, etc. Es
NOTE: Although one _could_ use variant for a lot of these use cases, it adds
an extra layer of indirection and mostly pushes off the problem to whoever
consumes the resulting variant column -- if they want to do anything fancy,
they'll just end up dealing with e.g. `Variant::String` instead of
`TapeElement::String`. Not an obvious win. Granted, `variant_get` adds a lot of
capability flexibility and error tolerance (e.g. type mismatches just return
NULL instead of errors), so it might be worth using variant instead of custom
decoders in simple path-based cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]