Re: [I] Optimized spill file format [datafusion]

via GitHub Wed, 16 Apr 2025 11:31:16 -0700


alamb commented on issue #14078:
URL: https://github.com/apache/datafusion/issues/14078#issuecomment-2810394103


   > The tricky part to implement is array encoding like REE or bit-packing for 
integer arrays. Maybe we can find some reusable code in Arrow Parquet writer 
implementation or use something like https://github.com/spiraldb/vortex. But 
it's okay to start without those encodings.
   
   I think we might get pretty far by simply compressing the arrow stream zstd 
or snappy  (which would be far simpler to implement as the arrow writer already 
supports this)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Optimized spill file format [datafusion]

Reply via email to