Jun,

I'm not sure that's enough.

A callback may not be enough since we can't be sure that there are no
events from that partition being processed while the new consumer starts
processing events from that partition.

I think that a consumer should be handed a partition only after we're sure
there is no other consumer that is reading or *processing *events from that
partition.

The only way to achieve that is, I think, by some kind of acknowledgment
from the consumer side that it is ready to give up the partition (e.g.
after gracefully stopped working internally with those events)

I know that means that there is a need to consider the extreme cases here,
but still I think we can't do without the consumer's 'acknoledment' without
being subject to race scenarios.

Thanks,

Guy




On Thu, Nov 1, 2012 at 4:56 PM, Jun Rao <jun...@gmail.com> wrote:

> Guy,
>
> Yes, this is possible. One solution that we have been thinking about is
> that if a rebalance happens, each consumer can somehow get a callback that
> indicates the set of partitions being consumed may have changed. Will this
> address your concern?
>
> Thanks,
>
> Jun
>
> On Thu, Nov 1, 2012 at 12:10 AM, Guy Peleg <guy.pe...@gmail.com> wrote:
>
> > One more possible race might happen when the partition number is fixed
> but
> > consumer(s) are added/removed
> > For example: If I have a consumer reading data from two partitions
> > (partition one and partition two), and a new consumer is added, the
> result
> > will be that each consumer will consume from one partition
> > let's say that the 'old' consumer will continue with partition one while
> > the new consumer will process the data from partition two
> >
> > but, suppose that partition two held events that belong to event id 'x',
> > and that partition is now consumed by the new consumer,
> > Since consumers might reside on different machines and they are possibly
> > multithreaded processes, there might be a situation that other event ids
> > 'x' are already 'in the internal queues' and are being processed
> > by the first consumer (events that were read/entered the first consumer
> > before the new consumer appeared but are being processed or wait to
> > processed within the 'old' consumer) and that means that there is a
> > possibility that those events are being processed simultaneously by the
> two
> > consumers (since the new consumer will start reading events that might be
> > of id 'x' and that might be then processed in parallel with event ids 'x'
> > in the old consumer)
> >
> > If that is a possible scenario then when a new consumer is starting there
> > should be some kind of 'consumers sync'
> >
> >
> >
> >
> >
> > On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > Guy,
> > >
> > > This is really an issue with changing # of partitions. If # of
> partitions
> > > changes for a topic, in the transition phase, messages used to be
> > delivered
> > > to the same partition could be delivered to different partitions and
> > their
> > > consumption ordering is non-deterministic (since ordered consumption is
> > > only guaranteed within a partition).
> > >
> > > In 0.7, # of partitions increases as new brokers are added. In 0.8, #
> of
> > > partitions is set at topic creation time and will stay the same when
> new
> > > brokers are added.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <guy.pe...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > As I learn and plan to use Kafka, I'm concirned about possible race
> > > > condition when brokers/consumers are added or removed.
> > > >
> > > > Say I have a topic that is devide into two partitions, where
> consumers
> > > are
> > > > deviding the mssages between those two partitions by ,say, modulo
> > > event-id,
> > > > where events with the same event ids should be processed by the order
> > of
> > > > their arrival, that will work since as I said, I will devide the
> > incoming
> > > > events by their event-id % number_of_partitions
> > > >
> > > > Now, when a new paratition is added, there might be situations where
> > > events
> > > > with event-id 'x', will still be in the first broker, while new ones,
> > > with
> > > > event-id 'x', are added to the new paratition
> > > > which may result in those events being processed in parallel, what
> am i
> > > > missing?
> > > >
> > > > Thanks,
> > > >
> > > > Guy
> > > >
> > >
> >
>

Reply via email to