Thanks Jay, think I was mismatching the concept of partition to (consumer) 
stream. This makes much more sense now.

-j




On Jun 12, 2012, at 2:34 PM, Jay Kreps wrote:

> I think a lot of these details are in the design doc, you may find that
> helpful (http://incubator.apache.org/kafka/design.html).
> 
> To answer your question, it isn't the case that only one machine is
> consuming. All machines in the group will consume. The way it works is that
> each broker has some number of partitions. These partitions are divided up
> over the consumer machines. The data in the partition is delivered in order
> to whichever consumer is currently consuming that partition. Zookeeper is
> used to balance the mapping of consumers to partitions. One consumer can
> have many partitions, but if you have more consumers than partitions some
> will not have any work to do.
> 
> -Jay
> 
> On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <jeff.rodenb...@teamaol.com
>> wrote:
> 
>> Great, I'm running the quick start and can see that in operation.
>> 
>> Ok, last question on this thread:
>> 
>>> So if you have two consumer groups consuming a topic, and each consumer
>> group has 4 machines in it, then a message published to this topic would be
>> delivered to one machine in each of the two groups.
>> 
>> How is topic load-balancing for consumers handled?  For example, if a
>> consumer group has 4 machines in it (consumer per machine), in reality only
>> one machine in the group is actually working.  If I want multiple machines
>> handling items in a topic, how is that approach handled? I could see
>> producers generating more topics, and consumers subscribing to those
>> (making a high-volume topic more granular).  What's best practice when
>> consumer tasks on topic messages need to be handled by multiple consumers?
>> 
>> -Jeff
>> 
>> 
>> 
>> 
>> 
>> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
>> 
>>> Basically the rule is this "every message sent to the topic is delivered
>> to
>>> one machine/process in each consumer group". So if you have two consumer
>>> groups consuming a topic, and each consumer group has 4 machines in it,
>>> then a message published to this topic would be delivered to one machine
>> in
>>> each of the two groups.
>>> 
>>> -Jay
>>> 
>>> On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
>>> jeff.rodenb...@teamaol.com> wrote:
>>> 
>>>> Thanks for the info, Jun.
>>>> 
>>>>> if you just want each message to be consumed by a consumer, not a
>>>> particular one
>>>> 
>>>> What is intended to be a particular consumer? Something on the order of
>>>> Consumer #3 within a group needs message #123?
>>>> 
>>>> Ok, next question:
>>>> 
>>>> What is the relationship between topics and consumer groups? More to the
>>>> point, can I have multiple consumer groups that all consume the same
>> topic?
>>>> For example, assume a set of producers are publishing to the topic
>> "ABC".
>>>> Suppose I have multiple processes that take action on a given ABC
>> message
>>>> -- process 1 handles billing, process 2 handles file management,
>> process 3
>>>> handles history/archiving, etc.  Can I structure multiple groups that
>>>> consume the same topic? How does partitioning work at that point?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
>>>> 
>>>>> Jeff,
>>>>> 
>>>>> Your understanding is correct. Operational wise, we have some jmx that
>>>>> gives consumer stats per topic. There is also a tool CheckOffsetLag
>> that
>>>>> tells you how far behind a consumer is. For coordination btw producers
>>>> and
>>>>> consumers, if you just want each message to be consumed by a consumer,
>>>> not
>>>>> a particular one, there is no coordination needed.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
>>>> jeff.rodenb...@teamaol.com
>>>>>> wrote:
>>>>> 
>>>>>> Hi all -
>>>>>> 
>>>>>> Just getting familiar with Kafka, and learning about consumer groups.
>>>>>> Hoping someone can provide some context here.
>>>>>> 
>>>>>> As I understand it, consumers register with the broker and consume a
>>>>>> topic. Multiple consumers can consume a single topic, as a consumer
>>>> group.
>>>>>> Each consumer actually gets a partition of messages, so there is no
>>>> overlap
>>>>>> -- a single consumer within a group will receive a message on its
>>>>>> topic/partition.  Consumer rebalancing is the process whereby members
>>>> of a
>>>>>> consumer group are added and/or dropped from the group, and partitions
>>>> are
>>>>>> sorted/reassigned to the current consumer group members.
>>>>>> 
>>>>>> Some questions:
>>>>>> 
>>>>>> *   Is this accurate? What am I missing?
>>>>>> *   Operationally, is consumer "failover" basically service monitoring
>>>> at
>>>>>> the consumer process level?
>>>>>> *   How much coordination is required between producers and consumers
>>>>>> around partitioning? (Automated, configuration, etc.)
>>>>>> *   How are topics monitored for SLA on throughput/load, i.e. spinning
>>>> up
>>>>>> consumers as needed for topic message spikes?
>>>>>> 
>>>>>> Appreciate any further information and/or context anyone can share.
>>>>>> 
>>>>>> cheers,
>>>>>> Jeff
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to