scovich commented on code in PR #8006:
URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2252866464
##########
arrow-avro/src/reader/mod.rs:
##########
@@ -124,23 +132,26 @@ fn read_header<R: BufRead>(mut reader: R) ->
Result<Header, ArrowError> {
/// A low-level interface for decoding Avro-encoded bytes into Arrow
`RecordBatch`.
#[derive(Debug)]
pub struct Decoder {
- record_decoder: RecordDecoder,
+ active_decoder: RecordDecoder,
+ active_fingerprint: Option<Fingerprint>,
batch_size: usize,
- decoded_rows: usize,
+ remaining_capacity: usize,
+ #[cfg(feature = "lru")]
+ cache: LruCache<Fingerprint, RecordDecoder>,
+ #[cfg(not(feature = "lru"))]
+ cache: IndexMap<Fingerprint, RecordDecoder>,
+ max_cache_size: usize,
+ reader_schema: Option<AvroSchema<'static>>,
+ writer_schema_store: Option<SchemaStore<'static>>,
Review Comment:
> `HeaderDecoder` seems able to get around this by parsing a json string
from the first row of an Avro file during the `ReaderBuilder::build` process
The `serde_json` code is doing [zero-copy
deserialization](https://serde.rs/lifetimes.html) from the header's byte
buffer. There's an implicit lifetime as if declared by:
```rust
pub fn schema<'a>(&'a self) -> Result<Option<Schema<'a>>, ArrowError> {
```
So the resulting schema inherits the lifetime of the header itself. AFAIK,
the only way we could get a similar effect here is to add a lifetime param to
the `Decoder`. I guess we could also try to create schemas internally, but
self-referential lifetimes tend to get really messy really fast in rust.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]