It occurs to me that we could do a better job with this error. There are really three things that might have happened (1) you restarted kafka within the zk timeout, in which case as far as zk is concerned your old broker still exists...this is weird but actually correct behavior, (2) you have two brokers with the same id, (3) zk has a bug and is not deleting ephemeral nodes.
I think if we just improved the error message to explain this we would have happier users, as is it requires slightly deep knowledge of zk to understand why this happens. -Jay On Fri, Oct 7, 2011 at 7:35 AM, Mathias Herberts <mathias.herbe...@gmail.com > wrote: > If you abort Kafka (killing the JVM for example) and restart it, > depending on the zookeeper timeout you've used, it might occur that > the ephemeral node create by the broker has not yet been removed by > ZK. > > If this is the case, Kafka will detect that there is a znode conflict > and kill itself. > > This is what your logs seem to imply: > > [2011-10-03 15:33:22,229] INFO conflict in /brokers/ids/0 data: > 10.98.20.109-1317681202194:10.98.20.109:9092 stored data: > 10.98.20.109-1317268078266:10.98.20.109:9092 (kafka.utils.ZkUtils$) > > Try to either wait for more than the ZK timeout prior to restarting > Kafka, or lower the ZK timeout so the ephemeral node is indeed gone > when you restart Kafka. > > Mathias. >