This is a bit frustrating since you keep saying that the load is not balanced, 
but the load actually is balanced, it's just balanced in an approximate 
fashion.  If you need exact balancing (for example, because you're creating a 
job scheduler or something), then you need to use a different strategy.  One 
example would be using an external atomic counter to determine what partition 
the producers should send the messages to.  Another would be using a single 
consumer with fanout.  I think this is outside the scope of Kafka, at least if 
I understand the problem here (?)

best,
Colin

On Mon, Jun 15, 2020, at 11:32, Vinicius Scheidegger wrote:
> Hi Collin,
> 
> One producer shouldn't need to know about the other to distribute the load
> equally, but what Kafka has now is roughly equal...
> If you have a single producer RounRobinPartitioner works fine, if you have
> 10 producers you can have 7/8 messages in one partition while another
> partition has none (producers are in sync - which happened a couple times
> in our tests).
> 
> Producer0 getNext() = partition0
> Producer1 getNext() = partition0
> Producer2 getNext() = partition0
> 
> A link to some of our test data prints:
> https://imgur.com/a/ha9OQMj
> 
> This, depending on how intensive (slow) your consumption rate is, may be a
> problem as it will generate enqueuing.
> We use Kafka as a messaging protocol in a big (and in some points heavy
> load) machine learning flow - for high throughput (lightweight processing)
> enqueuing is not an issue - aƱthough we saw it happening. but for heavy
> processes we are unable to do equal load balance.
> 
> We currently use the DefaultPartitioner and Kafka algorithm (murmur2 hash
> of the key) to decide the partition.
> We noticed enqueuing and timeouts while several consumers were idle - which
> made us take a better look on how the load is balanced.
> 
> I believe the only way to perform equal load balance without having to know
> other producers would be to do it on the Broker side. Do you agree?
> 
> Thanks,
> 
> 
> 
> On Mon, Jun 15, 2020 at 7:32 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > Hi Vinicius,
> >
> > It's actually not necessary for one producer to know about the others to
> > get an even distribution across partitions, right?  All that's really
> > required is that all producers produce a roughly equal amount of data to
> > each partition, which is what RoundRobinPartitioner is designed to do.  In
> > mathematical terms, the sum of several uniform random variables is itself
> > uniformly random.
> >
> > (There is a bug in RRP right now, KAFKA-9965, but it's not related to what
> > we're talking about now and we have a fix ready.)
> >
> > cheers,
> > Colin
> >
> >
> > On Sun, Jun 14, 2020, at 14:26, Vinicius Scheidegger wrote:
> > > Hi Collin,
> > >
> > > Thanks for the reply. Actually the RoundRobinPartitioner won't do an
> > equal
> > > distribution when working with multiple producers. One producer does not
> > > know the others. If you consider that producers are randomly producing
> > > messages, in the worst case scenario all producers can be synced and one
> > > could have as many messages in a single partition as the number of
> > > producers.
> > > It's easy to generate evidences of it.
> > >
> > > I have asked this question on the users mail list too (and on Slack and
> > on
> > > Stackoverflow).
> > >
> > > Kafka currently does not have means to do a round robin across multiple
> > > producers or on the broker side.
> > >
> > > This means there is currently NO GUARANTEE of equal distribution across
> > > partitions as the partition election is decided by the producer.
> > >
> > > There result is an unbalanced consumption when working with consumer
> > groups
> > > and the options are: creating a custom shared partitioner, relying on
> > Kafka
> > > random partition or introducing a middle man between topics (all of them
> > > having big cons).
> > >
> > > I thought of asking here to see whether this is a topic that could
> > concern
> > > other developers (and maybe understand whether this could be a KIP
> > > discussion)
> > >
> > > Maybe I'm missing something... I would like to know.
> > >
> > > According to my interpretation of the code (just read through some
> > > classes), but there is currently no way to do partition balancing on the
> > > broker - the producer sends messages directly to partition leaders so
> > > partition currently needs to be defined on the producer.
> > >
> > > I understand that in order to perform round robin across partitions of a
> > > topic when working with multiple producers, some development needs to be
> > > done. Am I right?
> > >
> > >
> > > Thanks
> > >
> > >
> > > On Fri, Jun 12, 2020, 10:57 PM Colin McCabe <cmcc...@apache.org> wrote:
> > >
> > > > HI Vinicius,
> > > >
> > > > This question seems like a better fit for the user mailing list rather
> > > > than the developer mailing list.
> > > >
> > > > Anyway, if I understand correctly, you are asking if the producer can
> > > > choose to assign partitions in a round-robin fashion rather than based
> > on
> > > > the key.  The answer is, you can, by using RoundRobinPartitioner.
> > (again,
> > > > if I'm understanding the question correctly).
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > On Tue, Jun 9, 2020, at 00:48, Vinicius Scheidegger wrote:
> > > > > Anyone?
> > > > >
> > > > > On Fri, Jun 5, 2020 at 2:42 PM Vinicius Scheidegger <
> > > > > vinicius.scheideg...@gmail.com> wrote:
> > > > >
> > > > > > Does anyone know how could I perform a load balance to distribute
> > > > equally
> > > > > > the messages to all consumers within the same consumer group having
> > > > > > multiple producers?
> > > > > >
> > > > > > Is this a conceptual flaw on Kafka, wasn't it thought for equal
> > > > > > distribution with multiple producers or am I missing something?
> > > > > > I've asked on Stack Overflow, on Kafka users mailing group, here
> > (on
> > > > Kafka
> > > > > > Devs) and on Slack - and still have no definitive answer (actually
> > > > most of
> > > > > > the time I got no answer at all)
> > > > > >
> > > > > > Would something like this even be possible in the way Kafka is
> > > > currently
> > > > > > designed?
> > > > > > How does proposing for a KIP work?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, May 28, 2020, 3:44 PM Vinicius Scheidegger <
> > > > > > vinicius.scheideg...@gmail.com> wrote:
> > > > > >
> > > > > >> Hi,
> > > > > >>
> > > > > >> I'm trying to understand a little bit more about how Kafka works.
> > > > > >> I have a design with multiple producers writing to a single topic
> > and
> > > > > >> multiple consumers in a single Consumer Group consuming message
> > from
> > > > this
> > > > > >> topic.
> > > > > >>
> > > > > >> My idea is to distribute the messages from all producers equally.
> > From
> > > > > >> reading the documentation I understood that the partition is
> > always
> > > > > >> selected by the producer. Is that correct?
> > > > > >>
> > > > > >> I'd also like to know if there is an out of the box option to
> > assign
> > > > the
> > > > > >> partition via a round robin *on the broker side *to guarantee
> > equal
> > > > > >> distribution of the load - if possible to each consumer, but if
> > not
> > > > > >> possible, at least to each partition.
> > > > > >>
> > > > > >> If my understanding is correct, it looks like in a multiple
> > producer
> > > > > >> scenario there is lack of support from Kafka regarding load
> > balancing
> > > > and
> > > > > >> customers have to either stick to the hash of the key (random
> > > > distribution,
> > > > > >> although it would guarantee same key goes to the same partition)
> > or
> > > > they
> > > > > >> have to create their own logic on the producer side (i.e. by
> > sharing
> > > > memory)
> > > > > >>
> > > > > >> Am I missing something?
> > > > > >>
> > > > > >> Thank you,
> > > > > >>
> > > > > >> Vinicius Scheidegger
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to