Re: [PR] Add zstd compression and decompression for pekko streams [pekko]

via GitHub Sat, 08 Nov 2025 08:59:26 -0800


mdedetrich commented on PR #2409:
URL: https://github.com/apache/pekko/pull/2409#issuecomment-3506717207

> zstd is indeed quite complex. In our work, it actually includes a complete
platform and toolchain for dictionary training, end-to-end deployment, version
synchronization, etc.

Actually the zstd-api is incredibly simple, especially if we decide to the
dumb solution of doing
`ByteString.fromArrayUnsafe(zstdCompressCtx.compress(input.toArrayUnsafe()))`.
In this case the compressor would just be a few lines of code and its far
simpler than what is done in
[gzip/deflate](https://github.com/apache/pekko/blob/b0fdac259bd57fdd481483f3fe9a7aec6e1ff38a/stream/src/main/scala/org/apache/pekko/stream/impl/io/compression/DeflateCompressor.scala)
which has to deal with arbitrary behavior in the JVM implementation.

The complexity that I am describing comes from doing something very smart
that is only possible with pekko-streams, i.e. taking into account downstream
demand in a stream when it comes to buffering, afaik no other implementation
does this.

On the other hand the complexity you describe is from your specific usecase
where you work. Training dictionaries would be entirely out of scope for this,
but of course you can specify a dictionary in the same way you can specify a
compression level and its a single of code.

> Therefore, could we consider releasing a separate zstd toolkit, such as
pekko-zstd or pekko-http-zstd, etc.? This would allow for a more complete
design and implementation, and also enable independent evolution.

Assuming the initial implementation is correct I don't see how this could
evolve aside from version bumps in zstd-jni (at least with assumption that
dictionary training is out of scope)

> Without a dictionary, zstd is simply a slightly faster gzip implementation.

Its not just slightly faster, it has significantly better compression
ratios. And slightly is downplaying it a bit, it can be many factors faster
(ofc it depends on the level and where the bottleneck is).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add zstd compression and decompression for pekko streams [pekko]

Reply via email to