>From Python, I'm dumping an LZ4-compressed arrow stream to a file, using

    with pa.output_stream(path, compression = 'lz4') as fh:
        writer = pa.RecordBatchStreamWriter(fh, table.schema)
        writer.write_table(table)
        writer.close()

I then try reading this file from Java, starting with

    var is = new LZ4FrameInputStream(new FileInputStream(path.toFile()));

using the lz4-java library. That fails, however, with

    java.lang.RuntimeException: Dependent block stream is unsupported
(BLOCK_INDEPENDENCE must be set)
    at
net.jpountz.lz4.LZ4FrameOutputStream$FLG.validate(LZ4FrameOutputStream.java:367)


so it looks like pyarrow is doing the compression with dependent blocks,
which lz4-java does not support.

I suspect I can solve this by doing the lz4 compression myself, using
Python's lz4 package, and wrapping it around an uncompressed pyarrow output
stream, but wanted to check if there isn't anything obvious I'm missing.

Best,
-J

Reply via email to