Re: [I] [arrow-avro] Develop Feature-flagged JIT Avro-to-Arrow decoder backend (schema-specialized fast path) [arrow-rs]

via GitHub Tue, 20 Jan 2026 08:23:40 -0800


scovich commented on issue #9211:
URL: https://github.com/apache/arrow-rs/issues/9211#issuecomment-3773794270


   > do not have a lot of knowledge about the current Avro implementation, but 
I wonder if you might get a speedup by splitting the decoding in two phases, to 
generate better vectorized code:
   > 
   >     1. Decode row-level bytes  into temporary per-column buffers 
(bytes/offsets/lengths etc.) based on the data type width / "variableness" 
(e.g. based on `[(FIXED(4)], (VARIABLE), (FIXED(2))])`, decode data into three 
buffers in a "simple" loop)
   >        1.b. It could perhaps specialize/optimize for fixed-width-only 
schema case or when fixed columns can be handled separately from variable width
   > 
   >     2. Parse/convert data types based on schema to arrow per-column (this 
should vectorize very well)
   
   This sounds a bit like the arrow-json tape decoder approach?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [arrow-avro] Develop Feature-flagged JIT Avro-to-Arrow decoder backend (schema-specialized fast path) [arrow-rs]

Reply via email to