[ 
https://issues.apache.org/jira/browse/KAFKA-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509318#comment-13509318
 ] 

Jay Kreps commented on KAFKA-598:
---------------------------------

So I guess the one hard requirement is that we have to be able to tell people 
how much memory our client will use. People have to set a heap size, and if we 
can't tell them how much memory we will use without crashing their app they 
will be unhappy.

Let's consider a hard case: 5 consumer processes, 100 topics, with 5 partitions 
each, and queue size of 5. With fetch size of 1MB and upper fetch size of 50MB 
what is a safe heap size for this person to configure and be assured we won't 
crash their app?

This is why I don't really see pipelined fetches helping. They have to stich it 
together into a ByteBuffer in the end, so fetching that in pieces doesn't 
really help. Supporting non-memory resident messages is possible but would be a 
massive re-architecture of almost everything.

Another option I don't think you covered would be to change the fetch request 
so that it takes a single size rather than one per partition. That would solve 
one dilemma we currently have--manu topics could have no new data but we have 
to budget space for them as if they would (since they might), doing this on the 
server side we could be a little bit smarter. However we would need to ensure 
that one partition that has infinite data to read can't starve out other 
partitions.
                
> decouple fetch size from max message size
> -----------------------------------------
>
>                 Key: KAFKA-598
>                 URL: https://issues.apache.org/jira/browse/KAFKA-598
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Joel Koshy
>         Attachments: KAFKA-598-v1.patch
>
>
> Currently, a consumer has to set fetch size larger than the max message size. 
> This increases the memory footprint on the consumer, especially when a large 
> number of topic/partition is subscribed. By decoupling the fetch size from 
> max message size, we can use a smaller fetch size for normal consumption and 
> when hitting a large message (hopefully rare), we automatically increase 
> fetch size to max message size temporarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to