[
https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347478#comment-14347478
]
Yasuhiro Matsuda commented on KAFKA-527:
----------------------------------------
This patch introduces BufferingOutputStream, an alternative for
ByteArrayOutputStream. It is backed by a chain of byte arrays, so it does not
copy bytes when increasing its capacity. Also, it has a method that writes the
content to ByteBuffer directly, so there is no need to create an array instance
to transfer the content to ByteBuffer. Lastly, it has a deferred write, which
means that you reserve a number of bytes before knowing the value and fill it
later. In MessageWriter (a new class), it is used for writing the CRC value and
the payload length.
On laptop,I tested the performance using TestLinearWriteSpeed with snappy.
Previously
26.64786026813998 MB per sec
With the patch
35.78401869390889 MB per sec
The improvement is about 34% better throughput.
> Compression support does numerous byte copies
> ---------------------------------------------
>
> Key: KAFKA-527
> URL: https://issues.apache.org/jira/browse/KAFKA-527
> Project: Kafka
> Issue Type: Bug
> Components: compression
> Reporter: Jay Kreps
> Assignee: Yasuhiro Matsuda
> Priority: Critical
> Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch,
> java.hprof.no-compression.txt, java.hprof.snappy.text
>
>
> The data path for compressing or decompressing messages is extremely
> inefficient. We do something like 7 (?) complete copies of the data, often
> for simple things like adding a 4 byte size to the front. I am not sure how
> this went by unnoticed.
> This is likely the root cause of the performance issues we saw in doing bulk
> recompression of data in mirror maker.
> The mismatch between the InputStream and OutputStream interfaces and the
> Message/MessageSet interfaces which are based on byte buffers is the cause of
> many of these.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)