mdedetrich opened a new pull request, #2409: URL: https://github.com/apache/pekko/pull/2409
Resolves: https://github.com/apache/pekko/issues/2404 This PR adds a zstd compression/decompression stream/flow using [zstd-jni](https://github.com/luben/zstd-jni), this project was chosen because its the only one that has high performance (it uses the [reference zstd implementation](https://github.com/facebook/zstd) via JNI) and also supports at least JDK 17 (its published with JDK 11). The PR still needs to be completed (documentation needs to be added along with MiMa exclusions) but I am creating a PR now with the necessary barebones earlier so that people can comment on whether the PR is on the right track, tests have been added (there is already a base testing framework for pekko-streams compression flows). The implementation of `ZstdCompressor`/`ZstdDecompressor` uses `ZstdDirectBufferCompressingStreamNoFinalizer`/`ZstdDirectBufferDecompressingStreamNoFinalizer` as these are the abstractions provided by zstd-jni to do streaming compression, note that the versions with `NoFinalizer` just mean that you need to explicitly shutdown the resource (which is what we want since Pekko Streams handles resource cleaning). These compression abstractions use direct `ByteBuffer` to handle the shuffling of data between the JNI boundary so that the C implementation can do its work directly in memory, the [zstd-jni tests](https://github.com/luben/zstd-jni/blob/9c3386d306086078155f58116a4d905e07239db4/src/test/scala/Zstd.scala) was the basis used to write the implementation. Some extra notes - `CoderSpec` had to be modified as the test which catches the exception to be thrown on corrupt input was hardcoded to `DataFormatException` where as zstd throws its own bespoke exception on corrupt input - The `ZstdCompressor` implements the `Compressor` abstraction which does a lot of heavy lifting (especially when it comes to tests) however the `ZstdDecompressor` intentionally does not implement `DeflateDecompressorBase` as the design is heavily tied to Java's deflate/compression API's, instead we use `SimpleLinearGraphStage[ByteString]` backed by `ZstdDirectBufferDecompressingStreamNoFinalizer` - The current API also allows you to specify a [dictionary](https://github.com/facebook/zstd?tab=readme-ov-file#dictionary-compression-how-to) when doing compression. Note that to do this, you need to pass a `com.github.luben.zstd.ZstdDictCompress` datastructure which is tied to the implementation of zstd-jni. There is an argument to create our own pekko equivalent of `ZstdDictCompress` which will internally map to a `com.github.luben.zstd.ZstdDictCompress`, doing so would allow us to swap to a different implementation of zstd without breaking the API. - This is the only part of the API that is tied to the zstd-jni implementation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
