[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-05-06 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706402#action_12706402
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Update: 

This issue is clearly linked to heavy utilization or swapping on the clients.  
I find that if I keep the clients from swapping that this error
materializes relatively infrequently, and when it does materialize it is linked 
to a sudden increase in load.  For example, the concurrent
start of 100 clients on 14 machines will sometimes trigger this issue.   I 
believe that the issue can be closed at this point with the note
that swapping will cause expired connections.  I also observe similar problems 
with jini / river, including cases where DGC (distributed
garbage collection) appears to fail.  All in all, it is my sense that Java 
processes must avoid swapping if they want to have not just timely
but also reliable behavior.

Thanks,

-bryan


 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-05-06 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706517#action_12706517
 ] 

Patrick Hunt commented on ZOOKEEPER-344:


Bryan, thanks for the update, I have updated the troubleshooting page with your 
feedback:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting


 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-04-02 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695136#action_12695136
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

I am not sure how to boil this down into a problem which can be run on a single 
machine.  This is a distributed database benchmark.  The problem shows up when 
the cluster is under load.  How would I go about isolating that further outside 
of writing stress tests for zookeeper?

If this is indeed a zookeeper bug and you have some idea of the possible issues 
involved, then perhaps you can suggest some additional instrumentation of 
zookeeper and I could run against a version with more instrumentation which 
might reveal something?

Thanks,

-bryan


 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-04-02 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695148#action_12695148
 ] 

Mahadev konar commented on ZOOKEEPER-344:
-

brayn,
 the one thing you can do is run with tracefile option in the config. Please 
take a look at
http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html to see how 
to set up a tracefile. The tracefile has logs of all the transactions that go 
though the server and all 
the components of the server and helps in debugging problems such as these, 
where you can point out which transaction got delayed and at what time and 
sometimes can point out
the reason why... 

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-04-01 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694660#action_12694660
 ] 

Mahadev konar commented on ZOOKEEPER-344:
-

for the first trace, closed != expired is true. it seems like the client 
reconnected to the server and didnt expire in the first case.

in the second case it did expire. 

You are on a windows machine right? and also its a single server ensemble? I 
dont usually use a windows box but I did notice once  that creating a log file 
with size of 64MB initially took 
a lot of time (5-15 seconds and was repeatable) on my windows box with no other 
load . Just to give you background, zookeeper takes snaphots and has 
transaction logs. The transaction logs 
are created with an initialize size of 64MB so that we do not have to update 
the size of the file repeatedly. This operation did take a lot of time on my 
windows machine and this did cause huge
latencies in my zookeeper clients and lots of timeouts and session expired. 

Ill take another look at your logs to see if i can find something that might 
explain the frequent session timeouts.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-04-01 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694673#action_12694673
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

It is a Linux platform, which I describe above.  it is a standalone instance 
however rather than an ensemble.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-31 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694060#action_12694060
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Here are some more stack traces with DEBUG on the server and the client for 
this issue.  The configuration, etc. is the same.  Logs were written onto an 
NFS share but the machines are synched with ntpd.

There are two distinct periods reported here.  One leads to a warning on the 
server but not to an expired session while the other issues the same warning on 
the server and leads to an expired session.

Here is a ping for sessionid 0x120597b6137000b shortly before the warning.

DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
 2009-03-30 17:34:33,643 - Processing request:: sessionid:0x120597b6137000b 
type:ping cxid:0xfffe zxid:0xfffe txntype:unknown n/a
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:137)
 2009-03-30 17:34:33,643 - sessionid:0x120597b6137000b type:ping 
cxid:0xfffe zxid:0xfffe txntype:unknown n/a
DEBUG [main-SendThread] 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:548) 
2009-03-30 17:34:33,643 - Got ping response for sessionid:0x120597b6137000b 
after 1ms

Here is the Exception causing close of session.

 WARN [main-SendThread] 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:898) 2009-03-30 
17:34:48,120 - Exception closing session 0x120597b6137000b to 
sun.nio.ch.selectionkeyi...@7eb1cc87
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
 WARN [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
2009-03-30 17:34:48,166 - Exception causing close of session 0x120597b6137000b 
due to java.io.IOException: Read error
DEBUG [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:420) 
2009-03-30 17:34:48,166 - IOException stack trace
java.io.IOException: Read error
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:295)
at 
org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:162)
 INFO [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:752) 
2009-03-30 17:34:48,172 - closing session:0x120597b6137000b NIOServerCnxn: 
java.nio.channels.SocketChannel[connected local=/192.168.6.21:2181 
remote=/192.168.6.28:60720]

And here is appears that the closed session was re-initialized?  Perhaps closed 
!= expired?

INFO [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.java:881)
 2009-03-30 17:34:50,111 - Finished init of 0x120597b6137000b valid:true
 INFO [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:531)
 2009-03-30 17:34:50,111 - Renewing session 0x120597b6137000b
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
 2009-03-30 17:34:50,112 - Processing request:: sessionid:0x120597b6137000b 
type:setWatches cxid:0xfff8 zxid:0xfffe txntype:unknown
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:137)
 2009-03-30 17:34:50,112 - sessionid:0x120597b6137000b type:setWatches 
cxid:0xfff8 zxid:0xfffe txntype:unknown
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
 2009-03-30 17:34:50,121 - Processing request:: sessionid:0x120597b6137000b 
type:create cxid:0x8 zxid:0xb95 txntype:-1 n/a
DEBUG [main-SendThread] 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:610) 
2009-03-30 17:34:50,126 - Reading reply sessionid:0x120597b6137000b, packet:: 
path:null finished:false header:: -8,101  replyHeader:: -8,2964,0  request:: 
2964,v{},v{'/benchmark/config/com.bigdata.service.jini.DataServer/logicalService11/masterElection_INVALID},v{}
  response:: null



Another trace that leads to an expired session:

DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
 2009-03-30 18:35:39,478 - Processing request:: sessionid:0x120597b61370008 
type:ping cxid:0xfffe zxid:0xfffe txntype:unknown n/a
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:137)
 2009-03-30 18:35:39,478 - sessionid:0x120597b61370008 type:ping 
cxid:0xfffe zxid:0xfffe txntype:unknown n/a
DEBUG [main-SendThread] 

[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-19 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683447#action_12683447
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Patrick, I did not try to coordinate the client and server logs but rather drew 
representative samples from each.  As far as I can tell it is more of the same 
in both logs.  However, I will correlate the events and the zxids and see if I 
can get that debug trace you suggested. -bryan

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-19 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683596#action_12683596
 ] 

Patrick Hunt commented on ZOOKEEPER-344:


That's fine. I guess what I mean is that it would be interesting to see the 
debug logs
for both the server and client at the time that the issue(s) start. We might get
more insight into what is happening if we can do that.


 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683181#action_12683181
 ] 

Patrick Hunt commented on ZOOKEEPER-344:


Hi Bryan, you might also try looking at some of the statistics using the stat 
command:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands
this will give you insight on the min/max/avg latency of requests. You could 
also use JMX if that works for you:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperJMX.html

What is the timeout value you are using for your ZK clients? If your max 
latency is exceeding your client
timeouts then you will definitely see expirations.

Secondly review this section, specifically related to tranaction log placement 
and jdk memory (swapping) issues:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems
Either of these issues can cause performance to dip, and latencies to increase.

This information, along with a bit more detail on your benchmark would help 
you/us identify what's causing
these issues. Re your benchmark, how many operations/sec are you running? 
What's the read/write split?

Your zk server is a single quad-core x86_64 cpu, correct?

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683235#action_12683235
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Let me clarify a few things based on the other comments:

1. The sessionTimeout for the client was set to 2.

2. The zookeeper server is running on a host with very little total load (very 
little CPU utilization and very low disk write rates).  There is only one disk 
available for the zookeeper transaction log.  It is a SAS 10k spindle with a 
16M cache.

3. The zookeeper server process has 4G of RAM.

4. The benchmark is not a zookeeper benchmark, but a database benchmark.  
Zookeeper is being used for distributed locks and master elections.  There is 
relatively little activity for the zookeeper server.

I will modify the logged message to record the zxid and report back some 
correlated events.  

I will also report the output of the stat command from the server for several 
times during the run / JXM, which I've enabled.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683236#action_12683236
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

I missed the question about the zk server.  It is an 8 core (2 quad core 
Opterons) 4x512k cache, 2.3Ghz clock with 32G ram.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683327#action_12683327
 ] 

Patrick Hunt commented on ZOOKEEPER-344:


Bryan, that's good info. It doesn't sound like zk server latency is the issue 
then, you have an excess
of cpu/memory based on the tests you are running, however it will be good to 
verify using jmx or the 
stat command.

If you can run with DEBUG logging enabled (server and client) it might give you 
more insight. Also running
at DEBUG level will cause the stack of the read error you are seeing to be 
printed to the server log (zk
version 3.1). If you can share all/part of the logs please feel free to attach 
them to this JIRA.

It's probably this code in server doIO though that's causing the server side 
read error exception you are seeing:

int rc = sock.read(incomingBuffer);
if (rc  0) {
throw new IOException(Read error);
}

read returns The number of bytes read, possibly zero, or -1 if the channel has 
reached end-of-stream

this indicates to me that the client has closed the connection.

Also, looking at your logs the client log is from 13:35 while the server log is 
from 13:06, assuming that the 
clocks are even fairly close this is almost 30min difference, if true it's 
unlikely the events are correlated?

My guess is that the client is closing the connection for some reason, but it 
would be interesting to see
the debug logs (with clocks that are fairly close on server/client so it would 
be easier to correlate the log
events).

Hope this helps.


 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.