[ 
https://issues.apache.org/jira/browse/ARROW-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458249#comment-17458249
 ] 

Antoine Pitrou commented on ARROW-15074:
----------------------------------------

Hmm, I think we should make the spec more precise about this.

Normally, you don't need to emit multiple frames to support streaming 
compression. At least the LZ4 C API allows streaming compression inside a 
single frame. Also, emitting multiple frames is probably worse for compression 
efficiency (because "Each frame is considered independent" as per the [LZ4 
spec|https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md#general-structure-of-lz4-frame-format]).

> [C++] Support multiple frames in LZ4?
> -------------------------------------
>
>                 Key: ARROW-15074
>                 URL: https://issues.apache.org/jira/browse/ARROW-15074
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Jorge Leitão
>            Priority: Major
>         Attachments: b.arrow
>
>
> When reading an arrow file with buffers LZ4-compressed with multiple frames, 
> we get
> {code:java}
> OSError: Lz4 compressed input contains more than one frame
> {code}
> Attached is an example of such a file, which can be opened with
> {code:java}
> import pyarrow.ipc
> with pa.ipc.open_file("b.arrow") as reader:
>     print(reader.get_batch(0))
> {code}
> that fails with the error above.
> The LZ4 frame supports multiple frames and we do not refer that only one 
> frame should be on a buffer as part of the spec.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to