Re: Docs (again!)

Edward Smith Fri, 27 Apr 2012 13:25:46 -0700

Thanks for the encouragement, Jay.  I'm new to actually contributing
to OSS, so I'm still feeling out what the norm is.


Ed

On Fri, Apr 27, 2012 at 1:07 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> Hey Edward,
>
> We actually greatly appreciate the feedback. Docs always make sense to
> the person who wrote them, who has been working closely on the thing
> for many months, but it is much harder to get them into shape for
> others so that they really give the information that is needed. So
> your feedback is not nitpicking it is actually very helpful.
>
> -Jay
>
> On Thu, Apr 26, 2012 at 3:13 PM, Edward Smith <esm...@stardotstar.org> wrote:
>> I swear I'm not nitpicking!  I'm working on ensuring I have my project
>> conceptually 'sane' before I get started, and I keep referring back to
>> the Kafka Design Docs to double check things.    I did notice that my
>> suggested changes last time made it in, thanks to Jun or whoever put
>> in the change.  I think it is much clearer now.
>>
>> We have these to paragraphs in conflict (I think):
>>
>> ---first paragraph---
>> Currently, there is no built-in load balancing between the producers
>> and the brokers in Kafka; in our own usage we publish from a large
>> number of heterogeneous machines and so it is desirable that the
>> publisher not need any explicit knowledge of the cluster topology. We
>> rely on a hardware load balancer to distribute the producer load
>> across multiple brokers. We will consider adding this in a future
>> release to allow semantic partitioning of messages (i.e. publishing
>> all messages to a particular broker based on some id to ensure an
>> ordered stream of updates within that id).
>>
>> ---second paragragh---
>> Automatic producer load balancing
>>
>> Kafka supports client-side load balancing for message producers or use
>> of a dedicated load balancer to balance TCP connections. A dedicated
>> layer-4 load balancer works by balancing TCP connections over Kafka
>> brokers. In this configuration all messages from a given producer go
>> to a single broker. The advantage of using a level-4 load balancer is
>> that each producer only needs a single TCP connection, and no
>> connection to zookeeper is needed. The disadvantage is that the
>> balancing is done at the TCP connection level, and hence it may not be
>> well balanced (if some producers produce many more messages then
>> others, evenly dividing up the connections per broker may not result
>> in evenly dividing up the messages per broker).
>>
>> Client-side zookeeper-based load balancing solves some of these
>> problems. It allows the producer to dynamically discover new brokers,
>> and balance load on a per-request basis. Likewise it allows the
>> producer to partition data according to some key instead of randomly,
>> which enables stickiness on the consumer (e.g. partitioning data
>> consumption by user id). This feature is called "semantic
>> partitioning", and is described in more detail below.
>>
>> The working of the zookeeper-based load balancing is described below.
>> Zookeeper watchers are registered on the following events—
>> <snip>

Re: Docs (again!)

Reply via email to