mdedetrich opened a new pull request, #2409:
URL: https://github.com/apache/pekko/pull/2409

   Resolves: https://github.com/apache/pekko/issues/2404
   
   This PR adds a zstd compression/decompression stream/flow using 
[zstd-jni](https://github.com/luben/zstd-jni), this project was chosen because 
its the only one that has high performance (it uses the [reference zstd 
implementation](https://github.com/facebook/zstd) via JNI) and also supports at 
least JDK 17 (its published with JDK 11).
   
   The PR still needs to be completed (documentation needs to be added along 
with MiMa exclusions) but I am creating a PR now with the necessary barebones 
earlier so that people can comment on whether the PR is on the right track, 
tests have been added (there is already a base testing framework for 
pekko-streams compression flows).
   
   The implementation of `ZstdCompressor`/`ZstdDecompressor` uses 
`ZstdDirectBufferCompressingStreamNoFinalizer`/`ZstdDirectBufferDecompressingStreamNoFinalizer`
 as these are the abstractions provided by zstd-jni to do streaming 
compression, note that the versions with `NoFinalizer` just mean that you need 
to explicitly shutdown the resource (which is what we want since Pekko Streams 
handles resource cleaning). These compression abstractions use direct 
`ByteBuffer` to handle the shuffling of data between the JNI boundary so that 
the C implementation can do its work directly in memory,  the [zstd-jni 
tests](https://github.com/luben/zstd-jni/blob/9c3386d306086078155f58116a4d905e07239db4/src/test/scala/Zstd.scala)
 was the basis used to write the implementation.
   
   Some extra notes
   
   - `CoderSpec` had to be modified as the test which catches the exception to 
be thrown on corrupt input was hardcoded to `DataFormatException` where as zstd 
throws its own bespoke exception on corrupt input
   - The `ZstdCompressor` implements the `Compressor` abstraction which does a 
lot of heavy lifting (especially when it comes to tests) however the 
`ZstdDecompressor` intentionally does not implement `DeflateDecompressorBase` 
as the design is heavily tied to Java's deflate/compression API's, instead we 
use `SimpleLinearGraphStage[ByteString]` backed by 
`ZstdDirectBufferDecompressingStreamNoFinalizer`
   - The current API also allows you to specify a 
[dictionary](https://github.com/facebook/zstd?tab=readme-ov-file#dictionary-compression-how-to)
 when doing compression. Note that to do this, you need to pass a 
`com.github.luben.zstd.ZstdDictCompress` datastructure which is tied to the 
implementation of zstd-jni. There is an argument to create our own pekko 
equivalent of `ZstdDictCompress` which will internally map to a 
`com.github.luben.zstd.ZstdDictCompress`, doing so would allow us to swap to a 
different implementation of zstd without breaking the API.
     - This is the only part of the API that is tied to the zstd-jni 
implementation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to