[ https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede updated KAFKA-595: -------------------------------- Issue Type: Improvement (was: Bug) > Producer side compression is unnecessary > ---------------------------------------- > > Key: KAFKA-595 > URL: https://issues.apache.org/jira/browse/KAFKA-595 > Project: Kafka > Issue Type: Improvement > Affects Versions: 0.8 > Reporter: Neha Narkhede > Labels: feature, features > > Compression can be used to store something in less space (less IO) and/or > transfer it less expensively (better use of network bandwidth). Often the two > go hand in hand, such as when compressed data is written to a disk: the disk > I/O takes less time, since less bits are being transferred, and the storage > occupied on the disk after the transfer is less. Unfortunately, the time to > compress the data can exceed the savings gained from transferring less data, > resulting in overall degradation. > After KAFKA-506, the network usage gains we used to get by compressing data > at the producers is exceeded by the cost of decompressing and re-compressing > data at the server side. Compression to save on network costs must be done > either to reduce the contention in a wide-area network due to multiple point > to point connections OR to efficiently transfer data over low-bandwidth > networks (cross DC). In the case of producer-server connections, neither is > typically true, which means we might not benefit from producer side > compression at all in most production deployments of Kafka. On the contrary, > it might actually hurt performance since most production deployments turn on > compression for all topics. > The main benefit of compressing data in Kafka is to efficiently transfer data > cross DC for setting up mirrored Kafka clusters. The performance benefit is > also true for real time consumers, especially when there are multiple groups > of consumers consuming the same topic. If data is compressed on the server > side instead, which we do anyways, we can get the I/O savings as well as > efficient network transfer on the server-consumer links. > I don't have numbers to quantify the performance impact of re-compression > now, since there are other changes that need to be done to test this > correctly. > Thoughts ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira