[ https://issues.apache.org/jira/browse/KAFKA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217626#comment-17217626 ]
James Yuzawa commented on KAFKA-10470: -------------------------------------- The maintainer of zstd-jni seems to be back from vacation, so I was going to inquire more about what they think about adding more buffering on that linked github issue above. I'm finding the performance hits are occurring when Kafka writes a few bytes at a time typically from ByteUtils class when using the ZstdOutputStream. I also found that the ZstdInputStream has the same performance hit when reading a few bytes at a time. As a stopgap solution, we could add a BufferedOutputStream and BufferedInputStream like it is done for GZIP: [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/CompressionType.java#L57] [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/CompressionType.java#L69] but in the zstd enum value [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/CompressionType.java#L118] This would help but at the cost of not being able to reuse the internal buffers within the BufferedOutputStream and and BufferedInputStream. > zstd decompression with small batches is slow and causes excessive GC > --------------------------------------------------------------------- > > Key: KAFKA-10470 > URL: https://issues.apache.org/jira/browse/KAFKA-10470 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.5.1 > Reporter: Robert Wagner > Priority: Major > > Similar to KAFKA-5150 but for zstd instead of LZ4, it appears that a large > decompression buffer (128kb) created by zstd-jni per batch is causing a > significant performance bottleneck. > The next upcoming version of zstd-jni (1.4.5-7) will have a new constructor > for ZstdInputStream that allows the client to pass its own buffer. A similar > fix as [PR #2967|https://github.com/apache/kafka/pull/2967] could be used to > have the ZstdConstructor use a BufferSupplier to re-use the decompression > buffer. -- This message was sent by Atlassian Jira (v8.3.4#803005)