Re: [PR] Implement arrow-avro SchemaStore and Fingerprinting To Enable Schema Resolution [arrow-rs]

via GitHub Mon, 04 Aug 2025 17:56:51 -0700


scovich commented on code in PR #8006:
URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2252866464



##########
arrow-avro/src/reader/mod.rs:
##########
@@ -124,23 +132,26 @@ fn read_header<R: BufRead>(mut reader: R) -> 
Result<Header, ArrowError> {
 /// A low-level interface for decoding Avro-encoded bytes into Arrow 
`RecordBatch`.
 #[derive(Debug)]
 pub struct Decoder {
-    record_decoder: RecordDecoder,
+    active_decoder: RecordDecoder,
+    active_fingerprint: Option<Fingerprint>,
     batch_size: usize,
-    decoded_rows: usize,
+    remaining_capacity: usize,
+    #[cfg(feature = "lru")]
+    cache: LruCache<Fingerprint, RecordDecoder>,
+    #[cfg(not(feature = "lru"))]
+    cache: IndexMap<Fingerprint, RecordDecoder>,
+    max_cache_size: usize,
+    reader_schema: Option<AvroSchema<'static>>,
+    writer_schema_store: Option<SchemaStore<'static>>,

Review Comment:
   > `HeaderDecoder` seems able to get around this by parsing a json string 
from the first row of an Avro file during the `ReaderBuilder::build` process
   
   The `serde_json` code is doing [zero-copy 
deserialization](https://serde.rs/lifetimes.html) from the header's byte 
buffer. There's an implicit lifetime as if declared by:
   ```rust
   pub fn schema<'a>(&'a self) -> Result<Option<Schema<'a>>, ArrowError> {
   ```
   
   So the resulting schema inherits the lifetime of the header itself. AFAIK, 
the only way we could get a similar effect here is to add a lifetime param to 
the `Decoder`. I guess we could also try to create schemas internally, but 
self-referential lifetimes tend to get really messy really fast in rust.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Implement arrow-avro SchemaStore and Fingerprinting To Enable Schema Resolution [arrow-rs]

Reply via email to