[
https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667895#comment-15667895
]
Wes McKinney commented on ARROW-300:
------------------------------------
One issue with doing compression only at the transport level is if people use
the Arrow memory layout and metadata to create file formats for storing larger
amounts of data. For example, I would like to deprecate the Feather metadata
https://github.com/wesm/feather/blob/master/cpp/src/feather/metadata.fbs and
use only the Arrow metadata. Unless you support column/buffer-level
compression, then it would be expensive to read only a subset of the file. You
could argue that such data should be stored as Parquet instead, but it does
offer a flexibility that's really appealing (particularly since random access
on memory-mapped Arrow-like data would be possible).
> [Format] Add buffer compression option to IPC file format
> ---------------------------------------------------------
>
> Key: ARROW-300
> URL: https://issues.apache.org/jira/browse/ARROW-300
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Format
> Reporter: Wes McKinney
>
> It may be useful if data is to be sent over the wire to compress the data
> buffers themselves as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer
> compression setting in the file Footer. Probably only two compressors worth
> supporting out of the box would be zlib (higher compression ratios) and lz4
> (better performance).
> What does everyone think?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)