jecsand838 commented on code in PR #8006:
URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2251801967


##########
arrow-avro/src/reader/mod.rs:
##########
@@ -124,23 +132,26 @@ fn read_header<R: BufRead>(mut reader: R) -> 
Result<Header, ArrowError> {
 /// A low-level interface for decoding Avro-encoded bytes into Arrow 
`RecordBatch`.
 #[derive(Debug)]
 pub struct Decoder {
-    record_decoder: RecordDecoder,
+    active_decoder: RecordDecoder,
+    active_fingerprint: Option<Fingerprint>,
     batch_size: usize,
-    decoded_rows: usize,
+    remaining_capacity: usize,
+    #[cfg(feature = "lru")]
+    cache: LruCache<Fingerprint, RecordDecoder>,
+    #[cfg(not(feature = "lru"))]
+    cache: IndexMap<Fingerprint, RecordDecoder>,
+    max_cache_size: usize,
+    reader_schema: Option<AvroSchema<'static>>,
+    writer_schema_store: Option<SchemaStore<'static>>,

Review Comment:
   > (I don't actually understand why avro schemas have lifetimes in the first 
place? Especially if they anyway require static lifetime in practice?)
   
   I'm not 100% certain why that decision was made. I was simply trying to 
extend from the original design pattern. 
   
   My opinion is we're taking advantage of an Avro nuance (knowing the schemas 
in advance of record decoding) to minimize overhead via a borrowed schema 
approach that leverages zero-copy parsing. This makes sense since every 
scenario requires us to know the writer and reader schema (if using schema 
evolution) in advance of instantiating a `RecordDecoder`. 
   



##########
arrow-avro/src/reader/mod.rs:
##########
@@ -124,23 +132,26 @@ fn read_header<R: BufRead>(mut reader: R) -> 
Result<Header, ArrowError> {
 /// A low-level interface for decoding Avro-encoded bytes into Arrow 
`RecordBatch`.
 #[derive(Debug)]
 pub struct Decoder {
-    record_decoder: RecordDecoder,
+    active_decoder: RecordDecoder,
+    active_fingerprint: Option<Fingerprint>,
     batch_size: usize,
-    decoded_rows: usize,
+    remaining_capacity: usize,
+    #[cfg(feature = "lru")]
+    cache: LruCache<Fingerprint, RecordDecoder>,
+    #[cfg(not(feature = "lru"))]
+    cache: IndexMap<Fingerprint, RecordDecoder>,
+    max_cache_size: usize,
+    reader_schema: Option<AvroSchema<'static>>,
+    writer_schema_store: Option<SchemaStore<'static>>,

Review Comment:
   Because the use of lifetimes predates both this PR and my involvement, would 
you be ok with this being addressed in a follow-up PR prior to going public? 
@scovich 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to