tustvold commented on PR #1190: URL: https://github.com/apache/datafusion-comet/pull/1190#issuecomment-2585173660
FWIW this encoding format is almost identical to the IPC format AFAICT with only some minor changes to the metadata encoding. The exact same validation is done when reading an array. I suspect that much of the overhead is fixed overheads in StreamWriter, e.g. encoding the schema, and that these could be optimised and/or eliminated by using the lower-level APIs such as [write_message](https://docs.rs/arrow-ipc/latest/arrow_ipc/writer/fn.write_message.html) and [root_as_message](https://docs.rs/arrow-ipc/latest/arrow_ipc/gen/Message/fn.root_as_message.html). The benchmarks at least appear to agree with this, with a relatively fixed performance delta on the order of 10s of microseconds between the two encoders. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
