[ 
https://issues.apache.org/jira/browse/ARROW-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314805#comment-17314805
 ] 

David Li commented on ARROW-12196:
----------------------------------

I think Micah is right - from the spec, there should be a check for 
{{uncompressed_length == -1}} in which case we should assume the data is 
already uncompressed. That's distinct from specifying no compression in the 
first place. I don't think anything implements this optimization right now but 
you could imagine it being useful. (For instance: if you're transcoding Parquet 
to Arrow, you could use the Parquet metadata to infer when compression likely 
isn't worth it. Or you could compress the first few messages of a stream and 
toggle off compression for columns for which it doesn't seem to help.)

> [C++] C++ IPC reading looks like it doesn't support uncompressed buffers 
> -------------------------------------------------------------------------
>
>                 Key: ARROW-12196
>                 URL: https://issues.apache.org/jira/browse/ARROW-12196
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Micah Kornfield
>            Priority: Major
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/reader.cc#L411 
> does seems to check for the case (I'm not sure if this is the right code 
> though):
>   uncompressed length may be set to -1 to indicate that the data that follows 
> is not compressed, which can be useful for cases where compression does not 
> yield appreciable savings.
> https://github.com/apache/arrow/blob/5cabd31c90dbb32d87074928f68bf5d6e97e37c6/format/Message.fbs#L59



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to