Re: [PR] [Website] Add post "How the Apache Arrow Format Accelerates Query Result Transfer" [arrow-site]

via GitHub Wed, 08 Jan 2025 09:23:28 -0800


telemenar commented on PR #569:
URL: https://github.com/apache/arrow-site/pull/569#issuecomment-2578218562


   > The intro of the blog post points to ser/de as a benefit to the arrow 
format. I'm curious if a reference exists (and can be, or will eventually be, 
added) that shows a similar comparison for arrow vs parquet. Mostly in the 
sense that storage sits in a mechanically similar spot (but the serialization 
and deserialization have an arbitrarily large time gap between their execution).
   
   Another thing that feeds into this beyond the storage benefits called out 
here:
   >Thanks @drin. This is part of what the second post in the series will 
cover. It will describe why formats like Parquet and ORC are typically better 
than Arrow for archival storage (mostly because higher compression ratios mean 
lower cost to store for long periods, which easily outweighs the tradeoff of 
higher ser/de overheads).
   
   Is that for archival storage in addition to the cost aspect, you are 
generally doing `ser` once and `de` many times. Which changes your tradeoffs. 
In the pure compression algo space, this might be the difference between 
choosing lz4 (wire) and zstd (archival).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Website] Add post "How the Apache Arrow Format Accelerates Query Result Transfer" [arrow-site]

Reply via email to