Jun, I'm not sure that's enough.
A callback may not be enough since we can't be sure that there are no events from that partition being processed while the new consumer starts processing events from that partition. I think that a consumer should be handed a partition only after we're sure there is no other consumer that is reading or *processing *events from that partition. The only way to achieve that is, I think, by some kind of acknowledgment from the consumer side that it is ready to give up the partition (e.g. after gracefully stopped working internally with those events) I know that means that there is a need to consider the extreme cases here, but still I think we can't do without the consumer's 'acknoledment' without being subject to race scenarios. Thanks, Guy On Thu, Nov 1, 2012 at 4:56 PM, Jun Rao <jun...@gmail.com> wrote: > Guy, > > Yes, this is possible. One solution that we have been thinking about is > that if a rebalance happens, each consumer can somehow get a callback that > indicates the set of partitions being consumed may have changed. Will this > address your concern? > > Thanks, > > Jun > > On Thu, Nov 1, 2012 at 12:10 AM, Guy Peleg <guy.pe...@gmail.com> wrote: > > > One more possible race might happen when the partition number is fixed > but > > consumer(s) are added/removed > > For example: If I have a consumer reading data from two partitions > > (partition one and partition two), and a new consumer is added, the > result > > will be that each consumer will consume from one partition > > let's say that the 'old' consumer will continue with partition one while > > the new consumer will process the data from partition two > > > > but, suppose that partition two held events that belong to event id 'x', > > and that partition is now consumed by the new consumer, > > Since consumers might reside on different machines and they are possibly > > multithreaded processes, there might be a situation that other event ids > > 'x' are already 'in the internal queues' and are being processed > > by the first consumer (events that were read/entered the first consumer > > before the new consumer appeared but are being processed or wait to > > processed within the 'old' consumer) and that means that there is a > > possibility that those events are being processed simultaneously by the > two > > consumers (since the new consumer will start reading events that might be > > of id 'x' and that might be then processed in parallel with event ids 'x' > > in the old consumer) > > > > If that is a possible scenario then when a new consumer is starting there > > should be some kind of 'consumers sync' > > > > > > > > > > > > On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > Guy, > > > > > > This is really an issue with changing # of partitions. If # of > partitions > > > changes for a topic, in the transition phase, messages used to be > > delivered > > > to the same partition could be delivered to different partitions and > > their > > > consumption ordering is non-deterministic (since ordered consumption is > > > only guaranteed within a partition). > > > > > > In 0.7, # of partitions increases as new brokers are added. In 0.8, # > of > > > partitions is set at topic creation time and will stay the same when > new > > > brokers are added. > > > > > > Thanks, > > > > > > Jun > > > > > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <guy.pe...@gmail.com> > wrote: > > > > > > > Hi, > > > > > > > > As I learn and plan to use Kafka, I'm concirned about possible race > > > > condition when brokers/consumers are added or removed. > > > > > > > > Say I have a topic that is devide into two partitions, where > consumers > > > are > > > > deviding the mssages between those two partitions by ,say, modulo > > > event-id, > > > > where events with the same event ids should be processed by the order > > of > > > > their arrival, that will work since as I said, I will devide the > > incoming > > > > events by their event-id % number_of_partitions > > > > > > > > Now, when a new paratition is added, there might be situations where > > > events > > > > with event-id 'x', will still be in the first broker, while new ones, > > > with > > > > event-id 'x', are added to the new paratition > > > > which may result in those events being processed in parallel, what > am i > > > > missing? > > > > > > > > Thanks, > > > > > > > > Guy > > > > > > > > > >