Re: [PR] Implement arrow-avro SchemaStore and Fingerprinting To Enable Schema Resolution [arrow-rs]

via GitHub Fri, 01 Aug 2025 21:22:54 -0700


jecsand838 commented on code in PR #8006:
URL: https://github.com/apache/arrow-rs/pull/8006#discussion_r2249115313



##########
arrow-avro/src/reader/mod.rs:
##########
@@ -182,21 +174,130 @@ impl Decoder {
             FingerprintAlgorithm::Rabin,
             SchemaStore::fingerprint_algorithm,
         );
+        // The loop stops when the batch is full, a schema change is staged,
+        // or handle_prefix indicates we need more bytes (Some(0)).
         while total_consumed < data.len() && self.remaining_capacity > 0 {
-            if let Some(prefix_bytes) = 
self.handle_prefix(&data[total_consumed..], hash_type)? {
-                // A batch is complete when its `remaining_capacity` is 0. It 
may be completed early if
-                // a schema change is detected or there are insufficient bytes 
to read the next prefix.
-                // A schema change requires a new batch.
-                total_consumed += prefix_bytes;
-                break;
+            match self.handle_prefix(&data[total_consumed..], hash_type)? {
+                None => {
+                    // No prefix: decode one row.
+                    let n = 
self.active_decoder.decode(&data[total_consumed..], 1)?;
+                    total_consumed += n;
+                    self.remaining_capacity -= 1;
+                }
+                Some(0) => {
+                    // Detected start of a prefix but need more bytes.
+                    break;
+                }

Review Comment:
   @scovich I got this figured out. Currently we can only decode full single 
object encoded records via `Decoder::decode`. In my very next PR I'll add a 
partial record chunking feature to the `Decoder` which will resolve this. 
   
   I have it about 90% complete on my local machine, I just didn't want to grow 
this PR any further.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Implement arrow-avro SchemaStore and Fingerprinting To Enable Schema Resolution [arrow-rs]

Reply via email to