mdedetrich commented on PR #2409:
URL: https://github.com/apache/pekko/pull/2409#issuecomment-3506717207

   > zstd is indeed quite complex. In our work, it actually includes a complete 
platform and toolchain for dictionary training, end-to-end deployment, version 
synchronization, etc.
   
   Actually the zstd-api is incredibly simple, especially if we decide to the 
dumb solution of doing 
`ByteString.fromArrayUnsafe(zstdCompressCtx.compress(input.toArrayUnsafe()))`. 
In this case the compressor would just be a few lines of code and its far 
simpler than what is done in 
[gzip/deflate](https://github.com/apache/pekko/blob/b0fdac259bd57fdd481483f3fe9a7aec6e1ff38a/stream/src/main/scala/org/apache/pekko/stream/impl/io/compression/DeflateCompressor.scala)
 which has to deal with arbitrary behavior in the JVM implementation.
   
   The complexity that I am describing comes from doing something very smart 
that is only possible with pekko-streams, i.e. taking into account downstream 
demand in a stream when it comes to buffering, afaik no other implementation 
does this.
   
   On the other hand the complexity you describe is from your specific usecase 
where you work. Training dictionaries would be entirely out of scope for this, 
but of course you can specify a dictionary in the same way you can specify a 
compression level and its a single of code.
   
   > Therefore, could we consider releasing a separate zstd toolkit, such as 
pekko-zstd or pekko-http-zstd, etc.? This would allow for a more complete 
design and implementation, and also enable independent evolution.
   
   Assuming the initial implementation is correct I don't see how this could 
evolve aside from version bumps in zstd-jni (at least with assumption that 
dictionary training is out of scope)
   
   > Without a dictionary, zstd is simply a slightly faster gzip implementation.
   
   Its not just slightly faster, it has significantly better compression 
ratios. And slightly is downplaying it a bit, it can be many factors faster 
(ofc it depends on the level and where the bottleneck is).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to