[ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683327#action_12683327 ]
Patrick Hunt commented on ZOOKEEPER-344: ---------------------------------------- Bryan, that's good info. It doesn't sound like zk server latency is the issue then, you have an excess of cpu/memory based on the tests you are running, however it will be good to verify using jmx or the stat command. If you can run with DEBUG logging enabled (server and client) it might give you more insight. Also running at DEBUG level will cause the stack of the "read error" you are seeing to be printed to the server log (zk version 3.1). If you can share all/part of the logs please feel free to attach them to this JIRA. It's probably this code in server doIO though that's causing the server side "read error" exception you are seeing: int rc = sock.read(incomingBuffer); if (rc < 0) { throw new IOException("Read error"); } read returns "The number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream" this indicates to me that the client has closed the connection. Also, looking at your logs the client log is from 13:35 while the server log is from 13:06, assuming that the clocks are even fairly close this is almost 30min difference, if true it's unlikely the events are correlated? My guess is that the client is closing the connection for some reason, but it would be interesting to see the debug logs (with clocks that are fairly close on server/client so it would be easier to correlate the log events). Hope this helps. > doIO in NioServerCnxn: Exception causing close of session : cause is "read > error" > --------------------------------------------------------------------------------- > > Key: ZOOKEEPER-344 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 > Project: Zookeeper > Issue Type: Bug > Components: java client, server > Affects Versions: 3.1.0 > Environment: jdk1.6.0_07 > Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 > x86_64 x86_64 x86_64 GNU/Linux > Reporter: bryan thompson > Fix For: 3.2.0 > > > I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I > see a lot of expired sessions. I am using a 16 node cluster which is all on > the same local network. There is a single zookeeper instance (these are > benchmarking runs). > The problem appears to be correlated with either run time or system load.\ > Personally I think that it is system load because I have session session > expired events under a Windows platform running zookeeper and the application > (i.e., everthing is local) when the application load suddenly spikes. To me > this suggests that the client is not able to renew (ping) the zookeeper > service in a timely manner and is expired. But the log messages below with > the "read error" suggest that maybe there is something else going on? > Zookeeper Configuration > #Wed Mar 18 12:41:05 GMT-05:00 2009 > clientPort=2181 > dataDir=/var/bigdata/benchmark/zookeeper/1 > syncLimit=2 > dataLogDir=/var/bigdata/benchmark/zookeeper/1 > tickTime=2000 > Some representative log messages are below. > Client side messages (from our app) > ERROR [main-EventThread] > com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) > 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. > New state: Expired : > zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode > ERROR [main-EventThread] > com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) > 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. > New state: Expired : > zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode > Server side messages: > WARN [NIOServerCxn.Factory:2181] > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) > 2009-03-18 13:06:57,252 - Exception causing close of session > 0x1201aac14300022 due to java.io.IOException: Read error > WARN [NIOServerCxn.Factory:2181] > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) > 2009-03-18 13:06:58,198 - Exception causing close of session > 0x1201aac1430000f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.