[ 
https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967616#comment-15967616
 ] 

Wes McKinney commented on ARROW-300:
------------------------------------

[~kiszk] I agree that having in-memory compression schemes like in Spark is a 
good idea, in addition to simpler snappy/lz4/zlib buffer compression. Would you 
like to make a proposal for improvements to the Arrow metadata to support these 
compression schemes? We should indicate that Arrow implementations are not 
required to implement these in general, so for now they can be marked as 
experimental and optional for implementations (e.g. we wouldn't necessarily 
integration test them). For scan-based in-memory columnar workloads, these 
encodings can yield better scan throughput because of better cache efficiency, 
and many column-oriented databases rely on this to be able to achieve high 
performance, so having it natively in the Arrow libraries seems useful. 

> [Format] Add buffer compression option to IPC file format
> ---------------------------------------------------------
>
>                 Key: ARROW-300
>                 URL: https://issues.apache.org/jira/browse/ARROW-300
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Format
>            Reporter: Wes McKinney
>
> It may be useful if data is to be sent over the wire to compress the data 
> buffers themselves as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer 
> compression setting in the file Footer. Probably only two compressors worth 
> supporting out of the box would be zlib (higher compression ratios) and lz4 
> (better performance).
> What does everyone think?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to