To be fair, I'm happy to apply it at IPC level. Just didn't realise that was a thing. IIUC what Antoine suggests, though, then just (leaving Python as-is and) changing my Java to
var is = new FileInputStream(path.toFile()); var reader = new ArrowStreamReader(is, allocator); var schema = reader.getVectorSchemaRoot().getSchema(); (i.e. just get rid of the lz4 input stream) should work, i.e. let the reader figure it out? I see no option to specify the compression in the reader, so it might detect it? This, however, gives, java.io.IOException: Unexpected end of stream trying to read message. at org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:700) at org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:57) at org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:164) at org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:170) at org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:161) at org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:63) FWIW - and this makes sense now that I understand there's a difference between IPC compression and full stream compression - writing it in Python à la, fh = io.BytesIO() writer = pa.RecordBatchStreamWriter(fh, table.schema) writer.write_table(table) writer.close() bytes_ = fh.getvalue() compressed_bytes = lz4.frame.compress(bytes_, compression_level=3, block_linked=False) with open(path, 'wb') as fh: fh.write(compressed_bytes) works fine with the Java from the original email. -J On Thu, Jan 28, 2021 at 6:06 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > It might be worth opening up an issue with the lz4-java library. This > seems like the java implementation doesn't fully support the LZ4 stream > protocol? > > Antoine in this case it looks like Joris is applying the compression and > decompression at the file level NOT the IPC level. > > On Thu, Jan 28, 2021 at 10:01 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > Le 28/01/2021 à 17:59, Joris Peeters a écrit : > > > From Python, I'm dumping an LZ4-compressed arrow stream to a file, > using > > > > > > with pa.output_stream(path, compression = 'lz4') as fh: > > > writer = pa.RecordBatchStreamWriter(fh, table.schema) > > > writer.write_table(table) > > > writer.close() > > > > > > I then try reading this file from Java, starting with > > > > > > var is = new LZ4FrameInputStream(new > FileInputStream(path.toFile())); > > > > > > using the lz4-java library. That fails, however, with > > > > Well, that sounds expected. LZ4 compression in the IPC format does not > > work by compressing the whole stream. Instead, buffers in the stream > > are compressed individually, while metadata is uncompressed. > > > > So, you needn't wrap the stream with LZ4 yourself. Instead, just let > > the Java implementation of Arrow handle compression. It *should* work. > > > > Regards > > > > Antoine. > > >