[jira] [Commented] (ARROW-16921) [C#] Add decompression support for Record Batches

Adam Reeve (Jira) Wed, 07 Sep 2022 15:05:03 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-16921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601527#comment-17601527
 ]


Adam Reeve commented on ARROW-16921:
------------------------------------

Hi, we're interested in this feature at G-Research and believe compression 
support is important for use at scale. We're keen to help out where we can. I 
agree that it would be nice if there was a way for compression support to be 
added more automatically, without users needing to implement decoders 
themselves. An alternative approach that builds on the Type.GetType idea would 
be to provide wrapper packages for each compression format used in the IPC 
format (currently only Zstd and LZ4 I believe), and these could provide 
implementations of an IDecoder interface defined in the main dotnet Arrow 
library. So instead of getting the LZ4Stream type with Type.GetType for 
example, we could do something like this to work with the IDecoder interface 
without needing to use reflection:
{code:java}
var lz4DecoderType = Type.GetType("Apache.Arrow.Compression.Lz4.Lz4Decoder, 
Apache.Arrow.Compression.Lz4", false);
if (lz4DecoderType != null)
{
    if (Activator.CreateInstance(lz4DecoderType) is IDecoder decoder)
    {
        // use decoder
    }
    else
    {
        throw new Exception("Failed to cast Lz4Decoder to IDecoder");
    }
}
{code}
Having to maintain these extra wrapper packages would be a bit more work from 
an operational point of view though. It does seem like just adding new 
dependencies directly to the Arrow package would be a lot more straightforward, 
and given there are only two compression formats currently used is this really 
a problem?

On a more minor point, would Decompressor be a more precise term to use rather 
than Decoder? At least in the Parquet world, which I'm a bit more familiar 
with, encodings are a separate concept to compression formats.

> [C#] Add decompression support for Record Batches
> -------------------------------------------------
>
>                 Key: ARROW-16921
>                 URL: https://issues.apache.org/jira/browse/ARROW-16921
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C#
>            Reporter: Rishabh Rana
>            Assignee: Rishabh Rana
>            Priority: Major
>
> C# Implementation does not support reading batches written in other 
> implementations of Arrow when the compression is specified in IPC Write 
> options.
> e.g. Reading this batch from pyarrow in C# will fail:
> pyarrow.ipc.RecordStreamBatchWriter(sink, schema, 
> options=pyarrow,ipcWriteOptions(compression="lz4"))
>  
> This is to support decompression (lz4 & zstd) in the C# implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-16921) [C#] Add decompression support for Record Batches

Reply via email to