[ 
https://issues.apache.org/jira/browse/KAFKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099304#comment-14099304
 ] 

Joel Koshy commented on KAFKA-687:
----------------------------------

[~junrao] I was thinking over this a little more and I felt it is better not to 
design the new consumer's partition allocator API in this jira. There are a 
couple of reasons:
* The new consumer's allocator's interface requirements and desired 
implementations will be known precisely only when we get to it - i.e., when we 
are implementing the partition assignment in the new consumer. So we will most 
likely change it anyway when we implement the new consumer.
* The allocation code is not very complicated anyway so I don't think it is a 
lot of work to rewrite it in the new consumer implementation.
* With the "more general" API that we discussed, the range allocation can no 
longer an exact copy (unlike the original patch). I would prefer to avoid 
touching the range-partitioner in the existing consumer at this point since 
that is the default that most people use.

So what I would propose is the following: keep the partition allocation 
interface as in the original patch and provide only one more allocation 
implementation: roundrobin. This allocation scheme is legal only when using 
wildcards on all consumer instances and all the regexes are identical (although 
stream counts can be different).


> Rebalance algorithm should consider partitions from all topics
> --------------------------------------------------------------
>
>                 Key: KAFKA-687
>                 URL: https://issues.apache.org/jira/browse/KAFKA-687
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Pablo Barrera
>            Assignee: Joel Koshy
>         Attachments: KAFKA-687.patch, KAFKA-687_2014-07-18_15:55:15.patch
>
>
> The current rebalance step, as stated in the original Kafka paper [1], splits 
> the partitions per topic between all the consumers. So if you have 100 topics 
> with 2 partitions each and 10 consumers only two consumers will be used. That 
> is, for each topic all partitions will be listed and shared between the 
> consumers in the consumer group in order (not randomly).
> If the consumer group is reading from several topics at the same time it 
> makes sense to split all the partitions from all topics between all the 
> consumer. Following the example, we will have 200 partitions in total, 20 per 
> consumer, using the 10 consumers.
> The load per topic could be different and the division should consider this. 
> However even a random division should be better than the current algorithm 
> while reading from several topics and should harm reading from a few topics 
> with several partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to