[ https://issues.apache.org/jira/browse/KAFKA-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879445#comment-15879445 ]
Armin Braun commented on KAFKA-1895: ------------------------------------ {quote} Maybe a simpler way to achieve that would be to have a new Deserializer type which works with byte buffers instead of byte arrays? {quote} Having buffers here would be better than arrays and already allow for a lot of optimizations. The downside I see here is that this would not come with reuse of the deserialized object quite so naturally. Also then you start having to support two kinds of deserializers(which would create a lot of complication in the code, just to give users the same thing they'd have from the RawRecordIterator interface - the option to reuse the deserialized object). I agree here to some degree though: {code} would make the consumer more confusing (we've had a tough enough time explaining how the current API works). {code} Yes this would make it more confusing. But on the other hand, at least the existing API would not change. If you do this via the deserializers you could probably keep things a little simpler but also slower to the outside, but at a pretty high price in terms of added complexity in the codebase. Also my argument for this (adding another function) not being so bad would be that the interface already is fairly complex. Adding this method with proper javadoc (imo) will not be the reason for anyone to be tipped towards not understanding it anymore, who would have understood it before. Admittedly not the best argument in the world, but I feel like it's a reasonable tradeoff if you take the size of the necessary change into account (or the added complexity of different deserializer interfaces)? {quote} it might not be a great idea to give users direct access to the underlying buffers. {quote} This I would just solve by returning readonly buffers with the proper limit and position for a record set? This means the user must do some bounds checking, but this you have in Hadoop's RawKeyValueIterator too and is not an issue in my opinion. The other option would be to wrap the buffers in say `DataInput` to make the interface safer at the cost of a slight overhead (and the fact that some users may rather work from buffers than from DataInput). > Investigate moving deserialization and decompression out of KafkaConsumer > ------------------------------------------------------------------------- > > Key: KAFKA-1895 > URL: https://issues.apache.org/jira/browse/KAFKA-1895 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Reporter: Jay Kreps > > The consumer implementation in KAFKA-1760 decompresses fetch responses and > deserializes them into ConsumerRecords which are then handed back as the > result of poll(). > There are several downsides to this: > 1. It is impossible to scale serialization and decompression work beyond the > single thread running the KafkaConsumer. > 2. The results can come back during the processing of other calls such as > commit() etc which can result in caching these records a little longer. > An alternative would be to have ConsumerRecords wrap the actual compressed > serialized MemoryRecords chunks and do the deserialization during iteration. > This way you could scale this over a thread pool if needed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)