It's likely that the larger buffer and fetch size triggered long GCs, which caused ZK session to timeout. You may want to do a bit GC tuning.
Thanks, Jun On Mon, Jun 11, 2012 at 10:12 AM, Aaron Rankin <aa...@sproutsocial.com>wrote: > This might help to explain the root cause. I found that two consumer > parameters may be correlated with the broker issues. Our setup is > inter-data-center and so we followed some of the advice on the mirroring > wiki page, > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring. In > particular, we increased "socket.buffersize" to 655360 and "fetch.size" to > 3072000 on our consumers. Since disabling those two parameters, allowing > the defaults to take effect, our brokers haven't died once in well over a > day. Prior, they were dying every hour. > > > On Jun 11, 2012, at 10:28 AM, Aaron Rankin wrote: > > > Jun, > > > > I was using the github mirror, which appears to be active. The last > commit there is the same as with the Apache Git mirror > (2a59ad76c657e4aad8ee6ca67078f49d2f6017c9). > > > > > > Aaron > > > > > > On Jun 11, 2012, at 12:05 AM, Jun Rao wrote: > > > >> Aaron, > >> > >> Which Git did you try, github or the Apache git mirror? Kafka has moved > to > >> Apache. So please try the 0.7 release in Apache. > >> > >> The error you saw are from ZK. Do you see lots of ZK session expiration > in > >> your log? > >> > >> Thanks, > >> > >> Jun > >> > >> On Sat, Jun 9, 2012 at 8:34 AM, Aaron Rankin <aa...@sproutsocial.com> > wrote: > >> > >>> Hi, > >>> > >>> We're testing Kafka and have found that the process dies often. There's > >>> little to no indication of why. We're running the latest code from Git, > >>> which we built using the instructions there. We're also running > Zookeeper > >>> 3.3.5. Our setup has three brokers, producers running on the same > network > >>> and consumers in another data center, a 30ms Internet ping away. > >>> > >>> Does anyone have some intuition about why this is happening? > >>> > >>> The only stack trace we're seeing is coming from Zookeeper: > >>> > >>> 1193285089 [CommitProcessor:2] ERROR > >>> org.apache.zookeeper.server.NIOServerCnxn - Unexpected Exception: > >>> java.nio.channels.CancelledKeyException > >>> at > sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) > >>> at > sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) > >>> at > >>> > org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) > >>> at > >>> > org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) > >>> at > >>> > org.apache.zookeeper.server.NIOServerCnxn.process(NIOServerCnxn.java:1545) > >>> at > >>> > org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:115) > >>> at > >>> > org.apache.zookeeper.server.WatchManager.triggerWatch(WatchManager.java:87) > >>> at > >>> org.apache.zookeeper.server.DataTree.deleteNode(DataTree.java:577) > >>> at > >>> org.apache.zookeeper.server.DataTree.killSession(DataTree.java:829) > >>> at > >>> org.apache.zookeeper.server.DataTree.processTxn(DataTree.java:804) > >>> at > >>> org.apache.zookeeper.server.ZKDatabase.processTxn(ZKDatabase.java:328) > >>> at > >>> > org.apache.zookeeper.server.ZooKeeperServer.processTxn(ZooKeeperServer.java:715) > >>> at > >>> > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:107) > >>> at > >>> > org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) > >>> > >>> > >>> Also, we constantly are seeing these in the logs: > >>> > >>> 1193365748 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] INFO > >>> org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection > for > >>> client /127.0.0.1:53426 (no session established for client) > >>> 1193425755 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] INFO > >>> org.apache.zookeeper.server.NIOServerCnxn - Accepted socket connection > >>> from /127.0.0.1:53428 > >>> 1193425755 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] WARN > >>> org.apache.zookeeper.server.NIOServerCnxn - EndOfStreamException: > Unable > >>> to read additional data from client sessionid 0x0, likely client has > closed > >>> socket > >>> > >>> > >>> > >>> > > > >