Werner Daehn created KAFKA-8812:
-----------------------------------
Summary: Rebalance Producers - yes, I mean it ;-)
Key: KAFKA-8812
URL: https://issues.apache.org/jira/browse/KAFKA-8812
Project: Kafka
Issue Type: Improvement
Components: core
Affects Versions: 2.3.0
Reporter: Werner Daehn
Please bare with me. Initially this thought sounds stupid but it has its merits.
How do you build a distributed producer at the moment? You use Kafka Connect
which in turn requires a cluster that tells which instance is producing what
partitions.
On the consumer side it is different. There Kafka itself does the data
distribution. If you have 10 Kafka partitions and 10 consumers, each will get
data for one partition. With 5 consumers, each will get data from two
partitions. And if there is only a single consumer active, it gets all data.
All is managed by Kafka, all you have to do is start as many consumers as you
want.
I'd like to suggest something similar for the producers. A producer would tell
Kafka that its source has 10 partitions. The Kafka server then responds with a
list of partitions this instance shall be responsible for. If it is the only
producer, the response would be all 10 partitions. If it is the second instance
starting up, the first instance would get the information it should produce
data for partition 1-5 and the new one for partition 6-10. If the producer
fails to respond with an alive packet, a rebalance does happen, informing the
active producer to take more load and the dead producer will get an error when
sending data again.
For restart, the producer rebalance has to send the starting point where to
start producing the data onwards from as well, of course. Would be best if this
is a user generated pointer and not the topic offset. Then it can be e.g. the
database system change number, a database transaction id or something similar.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)