telemenar commented on PR #569: URL: https://github.com/apache/arrow-site/pull/569#issuecomment-2578218562
> The intro of the blog post points to ser/de as a benefit to the arrow format. I'm curious if a reference exists (and can be, or will eventually be, added) that shows a similar comparison for arrow vs parquet. Mostly in the sense that storage sits in a mechanically similar spot (but the serialization and deserialization have an arbitrarily large time gap between their execution). Another thing that feeds into this beyond the storage benefits called out here: >Thanks @drin. This is part of what the second post in the series will cover. It will describe why formats like Parquet and ORC are typically better than Arrow for archival storage (mostly because higher compression ratios mean lower cost to store for long periods, which easily outweighs the tradeoff of higher ser/de overheads). Is that for archival storage in addition to the cost aspect, you are generally doing `ser` once and `de` many times. Which changes your tradeoffs. In the pure compression algo space, this might be the difference between choosing lz4 (wire) and zstd (archival). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
