Jason Rosenberg created KAFKA-2172:
--------------------------------------
Summary: Round-robin partition assignment strategy too restrictive
Key: KAFKA-2172
URL: https://issues.apache.org/jira/browse/KAFKA-2172
Project: Kafka
Issue Type: Bug
Reporter: Jason Rosenberg
The round-ropin partition assignment strategy, was introduced for the
high-level consumer, starting with 0.8.2.1. This appears to be a very
attractive feature, but it has an unfortunate restriction, which prevents it
from being easily utilized. That is that it requires all consumers in the
consumer group have identical topic regex selectors, and that they have the
same number of consumer threads.
It turns out this is not always the case for our deployments. It's not unusual
to run multiple consumers within a single process (with different topic
selectors), or we might have multiple processes dedicated for different topic
subsets. Agreed, we could change these to have separate group ids for each sub
topic selector (but unfortunately, that's easier said than done). In several
cases, we do at least have separate client.ids set for each sub-consumer, so it
would be incrementally better if we could at least loosen the requirement such
that each set of topics selected by a groupid/clientid pair are the same.
But, if we want to do a rolling restart for a new version of a consumer config,
the cluster will likely be in a state where it's not possible to have a single
config until the full rolling restart completes across all nodes. This results
in a consumer outage while the rolling restart is happening.
Finally, it's especially problematic if we want to canary a new version for a
period before rolling to the whole cluster.
I'm not sure why this restriction should exist (as it obviously does not exist
for the 'range' assignment strategy). It seems it could be made to work
reasonably well with heterogenous topic selection and heterogenous thread
counts. The documentation states that "The round-robin partition assignor lays
out all the available partitions and all the available consumer threads. It
then proceeds to do a round-robin assignment from partition to consumer thread."
If the assignor can "lay out all the available partitions and all the available
consumer threads", it should be able to uniformly assign partitions to the
available threads. In each case, if a thread belongs to a consumer that
doesn't have that partition selected, just move to the next available thread
that does have the selection, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)