Thanks for the encouragement, Jay. I'm new to actually contributing to OSS, so I'm still feeling out what the norm is.
Ed On Fri, Apr 27, 2012 at 1:07 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > Hey Edward, > > We actually greatly appreciate the feedback. Docs always make sense to > the person who wrote them, who has been working closely on the thing > for many months, but it is much harder to get them into shape for > others so that they really give the information that is needed. So > your feedback is not nitpicking it is actually very helpful. > > -Jay > > On Thu, Apr 26, 2012 at 3:13 PM, Edward Smith <esm...@stardotstar.org> wrote: >> I swear I'm not nitpicking! I'm working on ensuring I have my project >> conceptually 'sane' before I get started, and I keep referring back to >> the Kafka Design Docs to double check things. I did notice that my >> suggested changes last time made it in, thanks to Jun or whoever put >> in the change. I think it is much clearer now. >> >> We have these to paragraphs in conflict (I think): >> >> ---first paragraph--- >> Currently, there is no built-in load balancing between the producers >> and the brokers in Kafka; in our own usage we publish from a large >> number of heterogeneous machines and so it is desirable that the >> publisher not need any explicit knowledge of the cluster topology. We >> rely on a hardware load balancer to distribute the producer load >> across multiple brokers. We will consider adding this in a future >> release to allow semantic partitioning of messages (i.e. publishing >> all messages to a particular broker based on some id to ensure an >> ordered stream of updates within that id). >> >> ---second paragragh--- >> Automatic producer load balancing >> >> Kafka supports client-side load balancing for message producers or use >> of a dedicated load balancer to balance TCP connections. A dedicated >> layer-4 load balancer works by balancing TCP connections over Kafka >> brokers. In this configuration all messages from a given producer go >> to a single broker. The advantage of using a level-4 load balancer is >> that each producer only needs a single TCP connection, and no >> connection to zookeeper is needed. The disadvantage is that the >> balancing is done at the TCP connection level, and hence it may not be >> well balanced (if some producers produce many more messages then >> others, evenly dividing up the connections per broker may not result >> in evenly dividing up the messages per broker). >> >> Client-side zookeeper-based load balancing solves some of these >> problems. It allows the producer to dynamically discover new brokers, >> and balance load on a per-request basis. Likewise it allows the >> producer to partition data according to some key instead of randomly, >> which enables stickiness on the consumer (e.g. partitioning data >> consumption by user id). This feature is called "semantic >> partitioning", and is described in more detail below. >> >> The working of the zookeeper-based load balancing is described below. >> Zookeeper watchers are registered on the following events— >> <snip>