Re: [PR] Acceleration : Iceberg table compaction [datafusion-comet]

via GitHub Mon, 16 Feb 2026 09:35:48 -0800


Shekharrajak commented on PR #3519:
URL: 
https://github.com/apache/datafusion-comet/pull/3519#issuecomment-3909729883


   > Provide a high level architecture diagram?
   
   
   <img width="907" height="715" alt="Screenshot 2026-02-16 at 10 58 33 PM" 
src="https://github.com/user-attachments/assets/c5948ab6-55bc-4aef-9108-50fc54605c93";
 />
   
   
   The rewrite commit API reference : 
https://github.com/apache/iceberg-rust/pull/2106  - so in this PR commit is 
happening in JVM, in future PRs we can have it native as well.
   
   > Explain where the performance benefit comes from, and why is it so much 
faster to pass batches over this JNI interface than the existing interface?
   
   The compaction is all about reading small files -> writing back larger 
files, so it is  I/O intensive work.
   
   Making read and write in rust is improving the performance: The entire I/O 
pipeline (Parquet read -> Arrow RecordBatch -> Parquet write) happens in Rust,  
eliminating the entire Spark orchestration layer, not just  replacing 
individual operators within it.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Acceleration : Iceberg table compaction [datafusion-comet]

Reply via email to