[GitHub] [arrow] adamreeve opened a new pull request, #33603: ARROW-16921: [C#] Support decompression of IPC format buffers

GitBox Tue, 10 Jan 2023 18:32:32 -0800


adamreeve opened a new pull request, #33603:
URL: https://github.com/apache/arrow/pull/33603


   # Which issue does this PR close?
   
   Closes https://issues.apache.org/jira/browse/ARROW-16921
   
   # What changes are included in this PR?
   
   This PR implements decompression support for Arrow IPC format files and 
streams in the dotnet/C# library.
   
   The main concern raised in the above Jira issue was that we don't want to 
add new NuGet package dependencies to support decompression formats that won't 
be needed by most users, so a default `CompressionProvider` implementation has 
been added that uses reflection to use the `ZstdNet` package for ZSTD 
decompression and `K4os.Compression.LZ4.Streams` and 
`CommunityToolkit.HighPerformance` for LZ4 Frame support if they are available. 
The `netstandard1.3` target has decompression support disabled due to some 
reflection functionality being missing, and neither `ZstdNet` or 
`K4os.Compression.LZ4.Streams` support `netstandard1.3`.
   
   The `ArrowFileReader` and `ArrowStreamReader` constructors accept an 
`ICompressionProvider` parameter to allow users to provide their own 
compression provider if they want to use different dependencies.
   
   ### Alternatives to consider
   
   An alternative approach that could be considered instead of reflection is to 
use these extra dependencies as build time dependencies but not make them 
dependencies of the NuGet package. I tested this out in 
https://github.com/adamreeve/arrow/commit/4544afde6fef12337c7b188cc497da0bc1bf829d
 and it seems to work reasonably well too but required bumping the version of 
`System.Runtime.CompilerServices.Unsafe` under the `netstandard2.0` and 
`netcoreapp3.1` targets. This reduces all the reflection boilerplate but seems 
pretty hacky as now Apache.Arrow.dll depends on these extra dlls and we rely on 
the dotnet runtime behaviour of not trying to load them until they're used. So 
I think the reflection approach is better.
   
   Another alternative would be to move decompression support into a separate 
NuGet package (eg.  `Apache.Arrow.Compression`) that depends on `Apache.Arrow` 
and has an implementation of `ICompressionProvider` that users can pass in to 
the `ArrowFileReader` constructor, or maybe has a way to register itself with 
the `Apache.Arrow` package so it only needs to be configured once. That would 
seem cleaner to me but I'm not sure how much work it would be to set up a whole 
new package.
   
   # Are these changes tested?
   
   Yes, new unit tests have been added. Test files have been created with a 
Python script that is included in the PR due to only decompression support 
being added and not compression support.
   
   # Are there any user-facing changes?
   
   Yes, this implements a new feature but in a backwards compatible way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] adamreeve opened a new pull request, #33603: ARROW-16921: [C#] Support decompression of IPC format buffers

Reply via email to