[ 
https://issues.apache.org/jira/browse/KAFKA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418395#comment-13418395
 ] 

Jay Kreps commented on KAFKA-406:
---------------------------------

Oh yes, and the other design requirement we had was that messages not be 
re-compressed on a fetch request. A simple implementation that didn't have this 
requirement would just be to have the consumer request N messages, and either 
specify to compress or not, and have the server read these into memory, 
decompress if its local log format is comrpessed, and then batch compress 
exactly the messages the client asked for, and send just that. The problem with 
this is that we have about a 5x read-to-write ratio so recompressing on each 
read is now recompressing the same stuff 5 times on average. This makes 
consumption way more expensive. I don't think this is a hard requirement but to 
make that approach fly we would have to demonstrate that the cpu overhead of 
compression would not become a serious bottleneck. I know this won't work with 
GZIP, but it might be possible to do it with snappy or a faster compression 
algo.
                
> Gzipped payload is a fully wrapped Message (with headers), not just payload
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-406
>                 URL: https://issues.apache.org/jira/browse/KAFKA-406
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7.1
>         Environment: N/A
>            Reporter: Lorenzo Alberton
>
> When creating a gzipped MessageSet, the collection of Messages is passed to 
> CompressionUtils.compress(), where each message is serialised [1] into a 
> buffer (not just the payload, the full Message with headers, uncompressed), 
> then gripped, and finally wrapped into another Message [2].
> In other words, the consumer has to unwrap the Message flagged as gzipped, 
> unzip the payload, and unwrap the unzipped payload again as a non-compressed 
> Message. 
> Is this double-wrapping the intended behaviour? 
> [1] messages.foreach(m => m.serializeTo(messageByteBuffer))
> [2] new Message(outputStream.toByteArray, compressionCodec) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to