It is also worth mentioning that this is just for producers, consumers always use zookeeper for load balancing and co-ordination. Logically this makes sense--partitioning production is trivial if you don't care about semantics of key=>partition assignment, but partitioning consumption is more complex because you need to divide up the partitions amongst the set of all consumers exactly.
-jay On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > The motivation here is is that literally every production process at > LinkedIn sends messages to Kafka as part of either user tracking or > operational monitoring or both. We are wary of adding that many zk > connections and watches, so we run this first tier through a simple L2 load > balancer that just randomly balances connections over brokers. The good > part about this is that we can do zookeeper upgrades without redeploying > all the production apps to upgrade their zk jar. > > As Neha says, the zk producer is used for key-based partitioning by the > smaller number of producers who need that. > > -Jay > > > On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > >> Mark, >> >> Most publishers at LinkedIn use a hardware load balancer approach. >> These are configured to do a TCP healthcheck that monitors if the >> kafka port on a broker is working. If it is, then requests are >> forwarded to the broker. Some publishers though are using the software >> load balancer based on zookeeper. Those applications want to do some >> key based partitioning of data. >> >> Thanks, >> Neha >> >> On Sat, Nov 5, 2011 at 11:49 AM, Mark <static.void....@gmail.com> wrote: >> > Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer >> > instead of ZooKeeper or do you use it in conjunction with ZooKeeper? >> > >> > Thanks >> > >> > On 11/4/11 7:09 PM, Jun Rao wrote: >> >> >> >> broker.list is used in the producer property file. One caveat is that >> the >> >> broker.list approach doesn't do healthcheck. Which means that if a >> broker >> >> goes down, the client could still try to send messages to it. At >> LinkedIn, >> >> we rely on a load balancer to do healthcheck for us. The zk-based >> >> producer, >> >> on the other hand, does health check. >> >> >> >> You can find out more details about our ZK design in our design page in >> >> the >> >> website or the paper in >> >> >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations >> . >> >> >> >> Jun >> >> >> >> On Fri, Nov 4, 2011 at 6:52 PM, Mark<static.void....@gmail.com> >> wrote: >> >> >> >>> I just noticed that there is an option to not use Zookeeper and >> instead >> >>> one can use a static list of brokers (#9 on >> >>> http://incubator.apache.org/** >> >>> >> >>> kafka/quickstart.html< >> http://incubator.apache.org/kafka/quickstart.html>). >> >>> Do i put this list in server.properties? >> >>> >> >>> It doesn't seem like you save much either way as you have to either >> >>> a) list out all the nodes in the zookeeper quorum in >> >>> zookeeper.properties >> >>> b) list out static brokers in server.properties. >> >>> >> >>> What are the benefits of using ZooKeeper over a static list? Can >> someone >> >>> also explain how Kafka uses ZooKeeper? >> >>> >> >>> Thanks >> >>> >> >>> >> > >> > >