Thanks Jay, think I was mismatching the concept of partition to (consumer) stream. This makes much more sense now.
-j On Jun 12, 2012, at 2:34 PM, Jay Kreps wrote: > I think a lot of these details are in the design doc, you may find that > helpful (http://incubator.apache.org/kafka/design.html). > > To answer your question, it isn't the case that only one machine is > consuming. All machines in the group will consume. The way it works is that > each broker has some number of partitions. These partitions are divided up > over the consumer machines. The data in the partition is delivered in order > to whichever consumer is currently consuming that partition. Zookeeper is > used to balance the mapping of consumers to partitions. One consumer can > have many partitions, but if you have more consumers than partitions some > will not have any work to do. > > -Jay > > On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <jeff.rodenb...@teamaol.com >> wrote: > >> Great, I'm running the quick start and can see that in operation. >> >> Ok, last question on this thread: >> >>> So if you have two consumer groups consuming a topic, and each consumer >> group has 4 machines in it, then a message published to this topic would be >> delivered to one machine in each of the two groups. >> >> How is topic load-balancing for consumers handled? For example, if a >> consumer group has 4 machines in it (consumer per machine), in reality only >> one machine in the group is actually working. If I want multiple machines >> handling items in a topic, how is that approach handled? I could see >> producers generating more topics, and consumers subscribing to those >> (making a high-volume topic more granular). What's best practice when >> consumer tasks on topic messages need to be handled by multiple consumers? >> >> -Jeff >> >> >> >> >> >> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote: >> >>> Basically the rule is this "every message sent to the topic is delivered >> to >>> one machine/process in each consumer group". So if you have two consumer >>> groups consuming a topic, and each consumer group has 4 machines in it, >>> then a message published to this topic would be delivered to one machine >> in >>> each of the two groups. >>> >>> -Jay >>> >>> On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff < >>> jeff.rodenb...@teamaol.com> wrote: >>> >>>> Thanks for the info, Jun. >>>> >>>>> if you just want each message to be consumed by a consumer, not a >>>> particular one >>>> >>>> What is intended to be a particular consumer? Something on the order of >>>> Consumer #3 within a group needs message #123? >>>> >>>> Ok, next question: >>>> >>>> What is the relationship between topics and consumer groups? More to the >>>> point, can I have multiple consumer groups that all consume the same >> topic? >>>> For example, assume a set of producers are publishing to the topic >> "ABC". >>>> Suppose I have multiple processes that take action on a given ABC >> message >>>> -- process 1 handles billing, process 2 handles file management, >> process 3 >>>> handles history/archiving, etc. Can I structure multiple groups that >>>> consume the same topic? How does partitioning work at that point? >>>> >>>> >>>> >>>> >>>> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote: >>>> >>>>> Jeff, >>>>> >>>>> Your understanding is correct. Operational wise, we have some jmx that >>>>> gives consumer stats per topic. There is also a tool CheckOffsetLag >> that >>>>> tells you how far behind a consumer is. For coordination btw producers >>>> and >>>>> consumers, if you just want each message to be consumed by a consumer, >>>> not >>>>> a particular one, there is no coordination needed. >>>>> >>>>> Thanks, >>>>> >>>>> Jun >>>>> >>>>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff < >>>> jeff.rodenb...@teamaol.com >>>>>> wrote: >>>>> >>>>>> Hi all - >>>>>> >>>>>> Just getting familiar with Kafka, and learning about consumer groups. >>>>>> Hoping someone can provide some context here. >>>>>> >>>>>> As I understand it, consumers register with the broker and consume a >>>>>> topic. Multiple consumers can consume a single topic, as a consumer >>>> group. >>>>>> Each consumer actually gets a partition of messages, so there is no >>>> overlap >>>>>> -- a single consumer within a group will receive a message on its >>>>>> topic/partition. Consumer rebalancing is the process whereby members >>>> of a >>>>>> consumer group are added and/or dropped from the group, and partitions >>>> are >>>>>> sorted/reassigned to the current consumer group members. >>>>>> >>>>>> Some questions: >>>>>> >>>>>> * Is this accurate? What am I missing? >>>>>> * Operationally, is consumer "failover" basically service monitoring >>>> at >>>>>> the consumer process level? >>>>>> * How much coordination is required between producers and consumers >>>>>> around partitioning? (Automated, configuration, etc.) >>>>>> * How are topics monitored for SLA on throughput/load, i.e. spinning >>>> up >>>>>> consumers as needed for topic message spikes? >>>>>> >>>>>> Appreciate any further information and/or context anyone can share. >>>>>> >>>>>> cheers, >>>>>> Jeff >>>>>> >>>> >>>> >> >>