scovich commented on code in PR #8006: URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2252866464
########## arrow-avro/src/reader/mod.rs: ########## @@ -124,23 +132,26 @@ fn read_header<R: BufRead>(mut reader: R) -> Result<Header, ArrowError> { /// A low-level interface for decoding Avro-encoded bytes into Arrow `RecordBatch`. #[derive(Debug)] pub struct Decoder { - record_decoder: RecordDecoder, + active_decoder: RecordDecoder, + active_fingerprint: Option<Fingerprint>, batch_size: usize, - decoded_rows: usize, + remaining_capacity: usize, + #[cfg(feature = "lru")] + cache: LruCache<Fingerprint, RecordDecoder>, + #[cfg(not(feature = "lru"))] + cache: IndexMap<Fingerprint, RecordDecoder>, + max_cache_size: usize, + reader_schema: Option<AvroSchema<'static>>, + writer_schema_store: Option<SchemaStore<'static>>, Review Comment: > `HeaderDecoder` seems able to get around this by parsing a json string from the first row of an Avro file during the `ReaderBuilder::build` process The `serde_json` code is doing [zero-copy deserialization](https://serde.rs/lifetimes.html) from the header's byte buffer. There's an implicit lifetime as if declared by: ```rust pub fn schema<'a>(&'a self) -> Result<Option<Schema<'a>>, ArrowError> { ``` So the resulting schema inherits the lifetime of the header itself. AFAIK, the only way we could get a similar effect here is to add a lifetime param to the `Decoder`. I guess we could also try to create schemas internally, but self-referential lifetimes tend to get really messy really fast in rust. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org