[ https://issues.apache.org/jira/browse/KAFKA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418395#comment-13418395 ]
Jay Kreps commented on KAFKA-406: --------------------------------- Oh yes, and the other design requirement we had was that messages not be re-compressed on a fetch request. A simple implementation that didn't have this requirement would just be to have the consumer request N messages, and either specify to compress or not, and have the server read these into memory, decompress if its local log format is comrpessed, and then batch compress exactly the messages the client asked for, and send just that. The problem with this is that we have about a 5x read-to-write ratio so recompressing on each read is now recompressing the same stuff 5 times on average. This makes consumption way more expensive. I don't think this is a hard requirement but to make that approach fly we would have to demonstrate that the cpu overhead of compression would not become a serious bottleneck. I know this won't work with GZIP, but it might be possible to do it with snappy or a faster compression algo. > Gzipped payload is a fully wrapped Message (with headers), not just payload > --------------------------------------------------------------------------- > > Key: KAFKA-406 > URL: https://issues.apache.org/jira/browse/KAFKA-406 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.7.1 > Environment: N/A > Reporter: Lorenzo Alberton > > When creating a gzipped MessageSet, the collection of Messages is passed to > CompressionUtils.compress(), where each message is serialised [1] into a > buffer (not just the payload, the full Message with headers, uncompressed), > then gripped, and finally wrapped into another Message [2]. > In other words, the consumer has to unwrap the Message flagged as gzipped, > unzip the payload, and unwrap the unzipped payload again as a non-compressed > Message. > Is this double-wrapping the intended behaviour? > [1] messages.foreach(m => m.serializeTo(messageByteBuffer)) > [2] new Message(outputStream.toByteArray, compressionCodec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira