Guy, Yes, this is possible. One solution that we have been thinking about is that if a rebalance happens, each consumer can somehow get a callback that indicates the set of partitions being consumed may have changed. Will this address your concern?
Thanks, Jun On Thu, Nov 1, 2012 at 12:10 AM, Guy Peleg <guy.pe...@gmail.com> wrote: > One more possible race might happen when the partition number is fixed but > consumer(s) are added/removed > For example: If I have a consumer reading data from two partitions > (partition one and partition two), and a new consumer is added, the result > will be that each consumer will consume from one partition > let's say that the 'old' consumer will continue with partition one while > the new consumer will process the data from partition two > > but, suppose that partition two held events that belong to event id 'x', > and that partition is now consumed by the new consumer, > Since consumers might reside on different machines and they are possibly > multithreaded processes, there might be a situation that other event ids > 'x' are already 'in the internal queues' and are being processed > by the first consumer (events that were read/entered the first consumer > before the new consumer appeared but are being processed or wait to > processed within the 'old' consumer) and that means that there is a > possibility that those events are being processed simultaneously by the two > consumers (since the new consumer will start reading events that might be > of id 'x' and that might be then processed in parallel with event ids 'x' > in the old consumer) > > If that is a possible scenario then when a new consumer is starting there > should be some kind of 'consumers sync' > > > > > > On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <jun...@gmail.com> wrote: > > > Guy, > > > > This is really an issue with changing # of partitions. If # of partitions > > changes for a topic, in the transition phase, messages used to be > delivered > > to the same partition could be delivered to different partitions and > their > > consumption ordering is non-deterministic (since ordered consumption is > > only guaranteed within a partition). > > > > In 0.7, # of partitions increases as new brokers are added. In 0.8, # of > > partitions is set at topic creation time and will stay the same when new > > brokers are added. > > > > Thanks, > > > > Jun > > > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <guy.pe...@gmail.com> wrote: > > > > > Hi, > > > > > > As I learn and plan to use Kafka, I'm concirned about possible race > > > condition when brokers/consumers are added or removed. > > > > > > Say I have a topic that is devide into two partitions, where consumers > > are > > > deviding the mssages between those two partitions by ,say, modulo > > event-id, > > > where events with the same event ids should be processed by the order > of > > > their arrival, that will work since as I said, I will devide the > incoming > > > events by their event-id % number_of_partitions > > > > > > Now, when a new paratition is added, there might be situations where > > events > > > with event-id 'x', will still be in the first broker, while new ones, > > with > > > event-id 'x', are added to the new paratition > > > which may result in those events being processed in parallel, what am i > > > missing? > > > > > > Thanks, > > > > > > Guy > > > > > >