ameyc opened a new issue, #11342: URL: https://github.com/apache/datafusion/issues/11342
### Is your feature request related to a problem or challenge? We are currently working on a stream processing system built atop DataFusion and as such Avro is a major format for us given its ubiquity in the Kafka world. We tried using the the existing Avro Reader in data fusion, however found it lacking in some critical ways that make not terribly useful for us in its present state. The reader currently does not support complex nested datatypes such as - 1. The List arrays [only support primitive types](https://github.com/apache/datafusion/blob/main/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs#L627) 2. Dictionary arrays only support Utf8 as its value types. Lastly, the reader seems to rely on `decode_internal` method on the `apache-avro` crate and seems to implement some of the Avro decoding "by hand". We ended up rolling our reader to support and we're able to use `decode_from_avro` datum and entirely pass on the avro decoding responsibility to the avro package. Would love to work with @tustvold who seems to contributed here the most to augment the existing limitations here. ### Describe the solution you'd like Addition of support for parsing complex datatypes. ### Describe alternatives you've considered Convert avro > json then rely on json_to_arrow conversion, but this leads to inevitable loss of type information. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
