[
https://issues.apache.org/jira/browse/ARROW-16921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584512#comment-17584512
]
Weston Pace commented on ARROW-16921:
-------------------------------------
[~rishabhrana] I think this approach could be slightly improved. What you have
described would require the reader to know ahead of time what compression
method was used on the IPC file. Could you perhaps do something like...
{noformat}
interface Decoder {
public void Decode(Stream input, Stream output);
}
interface CompressionProvider {
public Decoder GetDecoder(Apache.Arrow.Flatbuf.CompressionType
compressionType);
}
public class LZ4Decoder : Decoder {
public void Lz4Decode(Stream input, Stream output) {
LZ4Stream.Decode(input).CopyTo(output);
}
}
public class LZ4OnlyCompressionProvider : CompressionProvider {
private readonly LZ4Decoder _lz4Decoder = new LZ4Decoder();
public Decoder GetDecoder(Apache.Arrow.Flatbuf.CompressionType
compressionType) {
if (compressionType == CompressionType.LZ4_FRAME) {
return _lz4Decoder;
} else {
throw new NotImplementedException();
}
}
}
ArrowStreamReader(stream, new LZ4OnlyCompressionProvider());
{noformat}
This opens the door for a future where someone could write a compression
provider that supported both LZ4 and gzip (I think that gzip is part of
System.IO.Compression and so should be very easy to support).
> [C#] Add decompression support for Record Batches
> -------------------------------------------------
>
> Key: ARROW-16921
> URL: https://issues.apache.org/jira/browse/ARROW-16921
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C#
> Reporter: Rishabh Rana
> Assignee: Rishabh Rana
> Priority: Major
>
> C# Implementation does not support reading batches written in other
> implementations of Arrow when the compression is specified in IPC Write
> options.
> e.g. Reading this batch from pyarrow in C# will fail:
> pyarrow.ipc.RecordStreamBatchWriter(sink, schema,
> options=pyarrow,ipcWriteOptions(compression="lz4"))
>
> This is to support decompression (lz4 & zstd) in the C# implementation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)