Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

via GitHub Sat, 11 Jan 2025 01:36:58 -0800


tustvold commented on PR #1190:
URL: 
https://github.com/apache/datafusion-comet/pull/1190#issuecomment-2585173660


   FWIW this encoding format is almost identical to the IPC format AFAICT with 
only some minor changes to the metadata encoding. The exact same validation is 
done when reading an array.
   
   I suspect that much of the overhead is fixed overheads in StreamWriter, e.g. 
encoding the schema, and that these could be optimised and/or eliminated by 
using the lower-level APIs such as 
[write_message](https://docs.rs/arrow-ipc/latest/arrow_ipc/writer/fn.write_message.html)
 and 
[root_as_message](https://docs.rs/arrow-ipc/latest/arrow_ipc/gen/Message/fn.root_as_message.html).
 The benchmarks at least appear to agree with this, with a relatively fixed 
performance delta on the order of 10s of microseconds between the two encoders.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Implement custom RecordBatch serde for shuffle for improved performance [datafusion-comet]

Reply via email to