scovich commented on issue #9211: URL: https://github.com/apache/arrow-rs/issues/9211#issuecomment-3773794270
> do not have a lot of knowledge about the current Avro implementation, but I wonder if you might get a speedup by splitting the decoding in two phases, to generate better vectorized code: > > 1. Decode row-level bytes into temporary per-column buffers (bytes/offsets/lengths etc.) based on the data type width / "variableness" (e.g. based on `[(FIXED(4)], (VARIABLE), (FIXED(2))])`, decode data into three buffers in a "simple" loop) > 1.b. It could perhaps specialize/optimize for fixed-width-only schema case or when fixed columns can be handled separately from variable width > > 2. Parse/convert data types based on schema to arrow per-column (this should vectorize very well) This sounds a bit like the arrow-json tape decoder approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
