[ 
https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399855#comment-15399855
 ] 

Ben Stopford commented on KAFKA-2063:
-------------------------------------

Thanks for the kicking this one off Andrey. A couple of things come to mind on 
this:

1. I think it would make some sense if we also removed/deprecated the partition 
level maxBytes. Having both is a little confusing to the user. That way each 
request would just fill up to the request-level limitBytes using as many 
partitions as are needed to do so. 

2. I do wonder if randomisation is actually the best approach to this. A 
potentially better approach would be to adjust the order of the partitions 
passed by the consumer. So if we have a request for 8 partitions, we would pass 
partitions 0-7 to the server. The server might use partitions 0-3 to fill the 
response up to limitBytes. The consumer would then send the next fetch request 
with partitions ordered: 4,5,6,7,0,1,2,3. In this way we'd achieve reasonable 
fairness whilst also retaining some level of determinism. 



> Bound fetch response size
> -------------------------
>
>                 Key: KAFKA-2063
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2063
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jay Kreps
>
> Currently the only bound on the fetch response size is 
> max.partition.fetch.bytes * num_partitions. There are two problems:
> 1. First this bound is often large. You may chose 
> max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you 
> also need to consume 1k partitions this means you may receive a 1GB response 
> in the worst case!
> 2. The actual memory usage is unpredictable. Partition assignment changes, 
> and you only actually get the full fetch amount when you are behind and there 
> is a full chunk of data ready. This means an application that seems to work 
> fine will suddenly OOM when partitions shift or when the application falls 
> behind.
> We need to decouple the fetch response size from the number of partitions.
> The proposal for doing this would be to add a new field to the fetch request, 
> max_bytes which would control the maximum data bytes we would include in the 
> response.
> The implementation on the server side would grab data from each partition in 
> the fetch request until it hit this limit, then send back just the data for 
> the partitions that fit in the response. The implementation would need to 
> start from a random position in the list of topics included in the fetch 
> request to ensure that in a case of backlog we fairly balance between 
> partitions (to avoid first giving just the first partition until that is 
> exhausted, then the next partition, etc).
> This setting will make the max.partition.fetch.bytes field in the fetch 
> request much less useful and we  should discuss just getting rid of it.
> I believe this also solves the same thing we were trying to address in 
> KAFKA-598. The max_bytes setting now becomes the new limit that would need to 
> be compared to max_message size. This can be much larger--e.g. setting a 50MB 
> max_bytes setting would be okay, whereas now if you set 50MB you may need to 
> allocate 50MB*num_partitions.
> This will require evolving the fetch request protocol version to add the new 
> field and we should do a KIP for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to