Then, you may be interested in our consume client redesign: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
Thanks, Jun On Fri, Nov 2, 2012 at 7:17 AM, Guy Peleg <guy.pe...@gmail.com> wrote: > Jun, > > On that note, waiting to consumer's acknowledgment can be with configurable > timeout ranging from blocking to not wait at all > > Thanks, > > Guy > > > > On Fri, Nov 2, 2012 at 12:38 PM, Guy Peleg <guy.pe...@gmail.com> wrote: > > > Jun, > > > > I'm not sure that's enough. > > > > A callback may not be enough since we can't be sure that there are no > > events from that partition being processed while the new consumer starts > > processing events from that partition. > > > > I think that a consumer should be handed a partition only after we're > sure > > there is no other consumer that is reading or *processing *events from > > that partition. > > > > The only way to achieve that is, I think, by some kind of acknowledgment > > from the consumer side that it is ready to give up the partition (e.g. > > after gracefully stopped working internally with those events) > > > > I know that means that there is a need to consider the extreme cases > here, > > but still I think we can't do without the consumer's 'acknoledment' > without > > being subject to race scenarios. > > > > Thanks, > > > > Guy > > > > > > > > > > On Thu, Nov 1, 2012 at 4:56 PM, Jun Rao <jun...@gmail.com> wrote: > > > >> Guy, > >> > >> Yes, this is possible. One solution that we have been thinking about is > >> that if a rebalance happens, each consumer can somehow get a callback > that > >> indicates the set of partitions being consumed may have changed. Will > this > >> address your concern? > >> > >> Thanks, > >> > >> Jun > >> > >> On Thu, Nov 1, 2012 at 12:10 AM, Guy Peleg <guy.pe...@gmail.com> wrote: > >> > >> > One more possible race might happen when the partition number is fixed > >> but > >> > consumer(s) are added/removed > >> > For example: If I have a consumer reading data from two partitions > >> > (partition one and partition two), and a new consumer is added, the > >> result > >> > will be that each consumer will consume from one partition > >> > let's say that the 'old' consumer will continue with partition one > while > >> > the new consumer will process the data from partition two > >> > > >> > but, suppose that partition two held events that belong to event id > 'x', > >> > and that partition is now consumed by the new consumer, > >> > Since consumers might reside on different machines and they are > possibly > >> > multithreaded processes, there might be a situation that other event > ids > >> > 'x' are already 'in the internal queues' and are being processed > >> > by the first consumer (events that were read/entered the first > consumer > >> > before the new consumer appeared but are being processed or wait to > >> > processed within the 'old' consumer) and that means that there is a > >> > possibility that those events are being processed simultaneously by > the > >> two > >> > consumers (since the new consumer will start reading events that might > >> be > >> > of id 'x' and that might be then processed in parallel with event ids > >> 'x' > >> > in the old consumer) > >> > > >> > If that is a possible scenario then when a new consumer is starting > >> there > >> > should be some kind of 'consumers sync' > >> > > >> > > >> > > >> > > >> > > >> > On Wed, Oct 31, 2012 at 4:57 PM, Jun Rao <jun...@gmail.com> wrote: > >> > > >> > > Guy, > >> > > > >> > > This is really an issue with changing # of partitions. If # of > >> partitions > >> > > changes for a topic, in the transition phase, messages used to be > >> > delivered > >> > > to the same partition could be delivered to different partitions and > >> > their > >> > > consumption ordering is non-deterministic (since ordered consumption > >> is > >> > > only guaranteed within a partition). > >> > > > >> > > In 0.7, # of partitions increases as new brokers are added. In 0.8, > # > >> of > >> > > partitions is set at topic creation time and will stay the same when > >> new > >> > > brokers are added. > >> > > > >> > > Thanks, > >> > > > >> > > Jun > >> > > > >> > > On Wed, Oct 31, 2012 at 4:12 AM, Guy Peleg <guy.pe...@gmail.com> > >> wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > As I learn and plan to use Kafka, I'm concirned about possible > race > >> > > > condition when brokers/consumers are added or removed. > >> > > > > >> > > > Say I have a topic that is devide into two partitions, where > >> consumers > >> > > are > >> > > > deviding the mssages between those two partitions by ,say, modulo > >> > > event-id, > >> > > > where events with the same event ids should be processed by the > >> order > >> > of > >> > > > their arrival, that will work since as I said, I will devide the > >> > incoming > >> > > > events by their event-id % number_of_partitions > >> > > > > >> > > > Now, when a new paratition is added, there might be situations > where > >> > > events > >> > > > with event-id 'x', will still be in the first broker, while new > >> ones, > >> > > with > >> > > > event-id 'x', are added to the new paratition > >> > > > which may result in those events being processed in parallel, what > >> am i > >> > > > missing? > >> > > > > >> > > > Thanks, > >> > > > > >> > > > Guy > >> > > > > >> > > > >> > > >> > > > > >