[jira] [Commented] (KAFKA-1895) Investigate moving deserialization and decompression out of KafkaConsumer

Armin Braun (JIRA) Wed, 22 Feb 2017 16:19:49 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879543#comment-15879543
 ]


Armin Braun commented on KAFKA-1895:
------------------------------------

{quote}
   I don't know, supporting two deserializers sounds a lot easier than 
supporting two poll() methods.
{quote}

Fair point, though this only holds if the deserializer is not able to reuse the 
deserialized object (admittedly if we don't go that far, yes then it's probably 
not so hard to provide that interface). At least with interfaces `Writable` or 
Avro, the throughput you can achieve will take quite a hit if you don't have 
the ability for object reuse here.

{quote}
something like what Netty's ByteBuf provides).
{quote}

Yes exactly, you need shared state in Buffers and Pointers/Lengths between the 
iterator and the consumer. But I think moving the pointers with `next()` and 
having knowledge of all valid offsets/lengths shared between iterator and 
consumer makes this a relatively straightforward problem that has been dealt 
with elsewhere already too.

> Investigate moving deserialization and decompression out of KafkaConsumer
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-1895
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1895
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>            Reporter: Jay Kreps
>
> The consumer implementation in KAFKA-1760 decompresses fetch responses and 
> deserializes them into ConsumerRecords which are then handed back as the 
> result of poll().
> There are several downsides to this:
> 1. It is impossible to scale serialization and decompression work beyond the 
> single thread running the KafkaConsumer.
> 2. The results can come back during the processing of other calls such as 
> commit() etc which can result in caching these records a little longer.
> An alternative would be to have ConsumerRecords wrap the actual compressed 
> serialized MemoryRecords chunks and do the deserialization during iteration. 
> This way you could scale this over a thread pool if needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-1895) Investigate moving deserialization and decompression out of KafkaConsumer

Reply via email to