[
https://issues.apache.org/jira/browse/ARROW-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314805#comment-17314805
]
David Li commented on ARROW-12196:
----------------------------------
I think Micah is right - from the spec, there should be a check for
{{uncompressed_length == -1}} in which case we should assume the data is
already uncompressed. That's distinct from specifying no compression in the
first place. I don't think anything implements this optimization right now but
you could imagine it being useful. (For instance: if you're transcoding Parquet
to Arrow, you could use the Parquet metadata to infer when compression likely
isn't worth it. Or you could compress the first few messages of a stream and
toggle off compression for columns for which it doesn't seem to help.)
> [C++] C++ IPC reading looks like it doesn't support uncompressed buffers
> -------------------------------------------------------------------------
>
> Key: ARROW-12196
> URL: https://issues.apache.org/jira/browse/ARROW-12196
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Micah Kornfield
> Priority: Major
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/reader.cc#L411
> does seems to check for the case (I'm not sure if this is the right code
> though):
> uncompressed length may be set to -1 to indicate that the data that follows
> is not compressed, which can be useful for cases where compression does not
> yield appreciable savings.
> https://github.com/apache/arrow/blob/5cabd31c90dbb32d87074928f68bf5d6e97e37c6/format/Message.fbs#L59
--
This message was sent by Atlassian Jira
(v8.3.4#803005)