mdedetrich commented on PR #2409: URL: https://github.com/apache/pekko/pull/2409#issuecomment-3506717207
> zstd is indeed quite complex. In our work, it actually includes a complete platform and toolchain for dictionary training, end-to-end deployment, version synchronization, etc. Actually the zstd-api is incredibly simple, especially if we decide to the dumb solution of doing `ByteString.fromArrayUnsafe(zstdCompressCtx.compress(input.toArrayUnsafe()))`. In this case the compressor would just be a few lines of code and its far simpler than what is done in [gzip/deflate](https://github.com/apache/pekko/blob/b0fdac259bd57fdd481483f3fe9a7aec6e1ff38a/stream/src/main/scala/org/apache/pekko/stream/impl/io/compression/DeflateCompressor.scala) which has to deal with arbitrary behavior in the JVM implementation. The complexity that I am describing comes from doing something very smart that is only possible with pekko-streams, i.e. taking into account downstream demand in a stream when it comes to buffering, afaik no other implementation does this. On the other hand the complexity you describe is from your specific usecase where you work. Training dictionaries would be entirely out of scope for this, but of course you can specify a dictionary in the same way you can specify a compression level and its a single of code. > Therefore, could we consider releasing a separate zstd toolkit, such as pekko-zstd or pekko-http-zstd, etc.? This would allow for a more complete design and implementation, and also enable independent evolution. Assuming the initial implementation is correct I don't see how this could evolve aside from version bumps in zstd-jni (at least with assumption that dictionary training is out of scope) > Without a dictionary, zstd is simply a slightly faster gzip implementation. Its not just slightly faster, it has significantly better compression ratios. And slightly is downplaying it a bit, it can be many factors faster (ofc it depends on the level and where the bottleneck is). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
