See my inlined reply below. Thanks,
Jun On Tue, Aug 30, 2011 at 8:36 AM, Roman Garcia <rgar...@dridco.com> wrote: > >> Roman, > Without replication, Kafka can lose messages permanently if the > underlying storage system is damaged. Setting that aside, there are 2 > ways that you can achieve HA now. In either case, you need to set up a > Kafka cluster with at least 2 brokers. > > Thanks for the clarification Jun. But even then, with replication, you > could still lose messages, right? > > If you do synchronous replication with replication factor >1 and there is only 1 failure, you won't lose any messages. > >> [...] Unconsumed messages on that broker will not be available for > consumption until the broker comes up again. > > How does a Consumer fetch those "old" messages, given that it did > already fetch "new" messages at a higher offset? What am I missing? > There is one offset per topic/partition, if a partition is not available because a broker is down, its offset in the consumer won't grow anymore. > > >> The second approach is to use the built-in ZK-based software load > balancer in Kafka (by setting zk.connect in the producer config). In > this case, we rely on ZK to detect broker failures. > > This is the approach I've tried. I did use zj.connect. > I started all locally: > - 2 Kafka brokers (broker id=0 & 1, single partition) > - 3 zookeeper nodes (all of these on a single box) with different > election ports and different fs paths/ids. > - 5 producer threads sending <1k msgs > > Then I killed one of the Kafka brokers, and all my producer threads > died. > > That could be a bug. Are you using trunk? Any errors/exceptions in the log? > What I'm I doing wrong? > > > Thanks! > Roman > > > -----Original Message----- > From: Jun Rao [mailto:jun...@gmail.com] > Sent: Tuesday, August 30, 2011 11:44 AM > To: kafka-users@incubator.apache.org > Subject: Re: HA / failover > > Roman, > > Without replication, Kafka can lose messages permanently if the > underlying storage system is damaged. Setting that aside, there are 2 > ways that you can achieve HA now. In either case, you need to set up a > Kafka cluster with at least 2 brokers. > > The first approach is to put the hosts of all Kafka brokers in a VIP and > rely on a hardware load balancer to do health check and routing. In the > case, all producers send data through the VIP. If one of the brokers is > down temporarily, the load balancer will direct the produce requests to > the rest of the brokers. Unconsumed messages on that broker will not be > available for consumption until the broker comes up again. > > The second approach is to use the built-in ZK-based software load > balancer in Kafka (by setting zk.connect in the producer config). In > this case, we rely on ZK to detect broker failures. > > Thanks, > > Jun > > On Tue, Aug 30, 2011 at 7:18 AM, Roman Garcia <rgar...@dridco.com> > wrote: > > > Hi, I'm trying to figure out how my prod environment should look like, > > > and still I don't seem to understand how to achieve HA / FO > conditions. > > > > I realize this is going to be fully supported once there is > > replication, right? > > > > But what about right now? How do you guys achieve this? > > > > I understand at least LinkedIn has a Kafka cluster deployed. > > > > - How do you guys ensure no messages get lost before flush to disk > happens? > > > > - How did you manage to always have a broker available and redirect > > producers to those during failure? > > I've tried using Producer class with "sync" type and zookeeper, and > > killing one of two brokers, but I've got an exception. Should I handle > > > and retry then? > > > > So, to sum up, any pointer on how should I setup a prod env will be > > appreciated! Any doc I might have missed or a simple short example > > would help. > > Thanks! > > Roman > > >