[jira] [Commented] (KAFKA-1895) Investigate moving deserialization and decompression out of KafkaConsumer

Armin Braun (JIRA) Thu, 23 Feb 2017 04:50:03 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880367#comment-15880367
 ]


Armin Braun commented on KAFKA-1895:
------------------------------------

I think the way the threading works is somewhat separate from this issue, at 
least with the current state of affairs. Though a clean solution here enables 
having a clean separate I/O thread, would not require one for now though (at 
least I think we need a solution here that will work in the current framework 
of how threading is handled, otherwise this turns into some endless discussion 
on an unrealistically, at least in one step, large rewrite).

Right now the problem isn't exclusively the fact that there is no access to the 
raw responses, the issue is also very much in the way that raw data is handled 
internally by constantly reallocating ByteBuffer etc. too.
Implementing logic that is able to read messages of varying size in reused 
buffers and also able to provide stable iteration over the read result is what 
I would see as the main takeaway here, which then in turn would enable 
providing the raw data with low (no :)?) GC footprint to the user.

> Investigate moving deserialization and decompression out of KafkaConsumer
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-1895
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1895
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>            Reporter: Jay Kreps
>
> The consumer implementation in KAFKA-1760 decompresses fetch responses and 
> deserializes them into ConsumerRecords which are then handed back as the 
> result of poll().
> There are several downsides to this:
> 1. It is impossible to scale serialization and decompression work beyond the 
> single thread running the KafkaConsumer.
> 2. The results can come back during the processing of other calls such as 
> commit() etc which can result in caching these records a little longer.
> An alternative would be to have ConsumerRecords wrap the actual compressed 
> serialized MemoryRecords chunks and do the deserialization during iteration. 
> This way you could scale this over a thread pool if needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-1895) Investigate moving deserialization and decompression out of KafkaConsumer

Reply via email to