One more possible race might happen when the partition number is fixed but
consumer(s) are added/removed
For example: If I have a consumer reading data from two partitions
(partition one and partition two), and a new consumer is added, the result
will be that each consumer will consume from one partition
let's say that the 'old' consumer will continue with partition one while
the new consumer will process the data from partition two

but, suppose that partition two held events that belong to event id 'x',
and that partition is now consumed by the new consumer,
Since consumers might reside on different machines and they are possibly
multithreaded processes, there might be a situation that other event ids
'x' are already 'in the internal queues' and are being processed
by the first consumer (events that were read/entered the first consumer
before the new consumer appeared but are being processed or wait to
processed within the 'old' consumer) and that means that there is a
possibility that those events are being processed simultaneously by the two
consumers (since the new consumer will start reading events that might be
of id 'x' and that might be then processed in parallel with event ids 'x'
in the old consumer)

If that is a possible scenario then when a new consumer is starting there
should be some kind of 'consumers sync'





On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <jun...@gmail.com> wrote:

> Guy,
>
> This is really an issue with changing # of partitions. If # of partitions
> changes for a topic, in the transition phase, messages used to be delivered
> to the same partition could be delivered to different partitions and their
> consumption ordering is non-deterministic (since ordered consumption is
> only guaranteed within a partition).
>
> In 0.7, # of partitions increases as new brokers are added. In 0.8, # of
> partitions is set at topic creation time and will stay the same when new
> brokers are added.
>
> Thanks,
>
> Jun
>
> On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <guy.pe...@gmail.com> wrote:
>
> > Hi,
> >
> > As I learn and plan to use Kafka, I'm concirned about possible race
> > condition when brokers/consumers are added or removed.
> >
> > Say I have a topic that is devide into two partitions, where consumers
> are
> > deviding the mssages between those two partitions by ,say, modulo
> event-id,
> > where events with the same event ids should be processed by the order of
> > their arrival, that will work since as I said, I will devide the incoming
> > events by their event-id % number_of_partitions
> >
> > Now, when a new paratition is added, there might be situations where
> events
> > with event-id 'x', will still be in the first broker, while new ones,
> with
> > event-id 'x', are added to the new paratition
> > which may result in those events being processed in parallel, what am i
> > missing?
> >
> > Thanks,
> >
> > Guy
> >
>

Reply via email to