Shekharrajak commented on PR #3519: URL: https://github.com/apache/datafusion-comet/pull/3519#issuecomment-3909729883
> Provide a high level architecture diagram? <img width="907" height="715" alt="Screenshot 2026-02-16 at 10 58 33 PM" src="https://github.com/user-attachments/assets/c5948ab6-55bc-4aef-9108-50fc54605c93" /> The rewrite commit API reference : https://github.com/apache/iceberg-rust/pull/2106 - so in this PR commit is happening in JVM, in future PRs we can have it native as well. > Explain where the performance benefit comes from, and why is it so much faster to pass batches over this JNI interface than the existing interface? The compaction is all about reading small files -> writing back larger files, so it is I/O intensive work. Making read and write in rust is improving the performance: The entire I/O pipeline (Parquet read -> Arrow RecordBatch -> Parquet write) happens in Rust, eliminating the entire Spark orchestration layer, not just replacing individual operators within it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
