jecsand838 commented on code in PR #8006:
URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2234676478
##########
arrow-avro/src/reader/mod.rs:
##########
@@ -116,88 +129,295 @@ fn read_header<R: BufRead>(mut reader: R) ->
Result<Header, ArrowError> {
break;
}
}
- decoder.flush().ok_or_else(|| {
- ArrowError::ParseError("Unexpected EOF while reading Avro
header".to_string())
- })
+ decoder
+ .flush()
+ .ok_or_else(|| ArrowError::ParseError("Unexpected EOF while reading
Avro header".into()))
}
/// A low-level interface for decoding Avro-encoded bytes into Arrow
`RecordBatch`.
+///
+/// This decoder handles both standard Avro container file data and
single-object encoded
+/// messages by managing schema resolution and caching decoders.
#[derive(Debug)]
pub struct Decoder {
- record_decoder: RecordDecoder,
+ /// The maximum number of rows to decode into a single batch.
batch_size: usize,
+ /// The number of rows decoded into the current batch.
decoded_rows: usize,
+ /// The fingerprint of the active writer schema.
+ active_fp: Option<Fingerprint>,
+ /// The `RecordDecoder` corresponding to the active writer schema.
+ active_decoder: RecordDecoder,
+ /// An LRU cache of inactive `RecordDecoder`s, keyed by schema fingerprint.
+ cache: HashMap<Fingerprint, RecordDecoder>,
Review Comment:
I pushed up the `IndexMap` changes. Thank you again for that one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]