[ 
https://issues.apache.org/jira/browse/KAFKA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217626#comment-17217626
 ] 

James Yuzawa commented on KAFKA-10470:
--------------------------------------

The maintainer of zstd-jni seems to be back from vacation, so I was going to 
inquire more about what they think about adding more buffering on that linked 
github issue above. I'm finding the performance hits are occurring when Kafka 
writes a few bytes at a time typically from ByteUtils class when using the 
ZstdOutputStream. I also found that the ZstdInputStream has the same 
performance hit when reading a few bytes at a time. As a stopgap solution, we 
could add a BufferedOutputStream and BufferedInputStream like it is done for 
GZIP:

[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/CompressionType.java#L57]

[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/CompressionType.java#L69]

but in the zstd enum value 
[https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/CompressionType.java#L118]

This would help but at the cost of not being able to reuse the internal buffers 
within the BufferedOutputStream and and BufferedInputStream.

> zstd decompression with small batches is slow and causes excessive GC
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-10470
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10470
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.5.1
>            Reporter: Robert Wagner
>            Priority: Major
>
> Similar to KAFKA-5150 but for zstd instead of LZ4, it appears that a large 
> decompression buffer (128kb) created by zstd-jni per batch is causing a 
> significant performance bottleneck.
> The next upcoming version of zstd-jni (1.4.5-7) will have a new constructor 
> for ZstdInputStream that allows the client to pass its own buffer.  A similar 
> fix as [PR #2967|https://github.com/apache/kafka/pull/2967] could be used to 
> have the  ZstdConstructor use a BufferSupplier to re-use the decompression 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to