[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-05-06 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12706402#action_12706402
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Update: 

This issue is clearly linked to heavy utilization or swapping on the clients.  
I find that if I keep the clients from swapping that this error
materializes relatively infrequently, and when it does materialize it is linked 
to a sudden increase in load.  For example, the concurrent
start of 100 clients on 14 machines will sometimes trigger this issue.   I 
believe that the issue can be closed at this point with the note
that swapping will cause expired connections.  I also observe similar problems 
with jini / river, including cases where DGC (distributed
garbage collection) appears to fail.  All in all, it is my sense that Java 
processes must avoid swapping if they want to have not just timely
but also reliable behavior.

Thanks,

-bryan


 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-04-02 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695136#action_12695136
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

I am not sure how to boil this down into a problem which can be run on a single 
machine.  This is a distributed database benchmark.  The problem shows up when 
the cluster is under load.  How would I go about isolating that further outside 
of writing stress tests for zookeeper?

If this is indeed a zookeeper bug and you have some idea of the possible issues 
involved, then perhaps you can suggest some additional instrumentation of 
zookeeper and I could run against a version with more instrumentation which 
might reveal something?

Thanks,

-bryan


 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-04-01 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694673#action_12694673
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

It is a Linux platform, which I describe above.  it is a standalone instance 
however rather than an ensemble.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-31 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694060#action_12694060
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Here are some more stack traces with DEBUG on the server and the client for 
this issue.  The configuration, etc. is the same.  Logs were written onto an 
NFS share but the machines are synched with ntpd.

There are two distinct periods reported here.  One leads to a warning on the 
server but not to an expired session while the other issues the same warning on 
the server and leads to an expired session.

Here is a ping for sessionid 0x120597b6137000b shortly before the warning.

DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
 2009-03-30 17:34:33,643 - Processing request:: sessionid:0x120597b6137000b 
type:ping cxid:0xfffe zxid:0xfffe txntype:unknown n/a
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:137)
 2009-03-30 17:34:33,643 - sessionid:0x120597b6137000b type:ping 
cxid:0xfffe zxid:0xfffe txntype:unknown n/a
DEBUG [main-SendThread] 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:548) 
2009-03-30 17:34:33,643 - Got ping response for sessionid:0x120597b6137000b 
after 1ms

Here is the Exception causing close of session.

 WARN [main-SendThread] 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:898) 2009-03-30 
17:34:48,120 - Exception closing session 0x120597b6137000b to 
sun.nio.ch.selectionkeyi...@7eb1cc87
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
 WARN [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
2009-03-30 17:34:48,166 - Exception causing close of session 0x120597b6137000b 
due to java.io.IOException: Read error
DEBUG [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:420) 
2009-03-30 17:34:48,166 - IOException stack trace
java.io.IOException: Read error
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:295)
at 
org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:162)
 INFO [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:752) 
2009-03-30 17:34:48,172 - closing session:0x120597b6137000b NIOServerCnxn: 
java.nio.channels.SocketChannel[connected local=/192.168.6.21:2181 
remote=/192.168.6.28:60720]

And here is appears that the closed session was re-initialized?  Perhaps closed 
!= expired?

INFO [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.finishSessionInit(NIOServerCnxn.java:881)
 2009-03-30 17:34:50,111 - Finished init of 0x120597b6137000b valid:true
 INFO [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:531)
 2009-03-30 17:34:50,111 - Renewing session 0x120597b6137000b
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
 2009-03-30 17:34:50,112 - Processing request:: sessionid:0x120597b6137000b 
type:setWatches cxid:0xfff8 zxid:0xfffe txntype:unknown
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:137)
 2009-03-30 17:34:50,112 - sessionid:0x120597b6137000b type:setWatches 
cxid:0xfff8 zxid:0xfffe txntype:unknown
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
 2009-03-30 17:34:50,121 - Processing request:: sessionid:0x120597b6137000b 
type:create cxid:0x8 zxid:0xb95 txntype:-1 n/a
DEBUG [main-SendThread] 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:610) 
2009-03-30 17:34:50,126 - Reading reply sessionid:0x120597b6137000b, packet:: 
path:null finished:false header:: -8,101  replyHeader:: -8,2964,0  request:: 
2964,v{},v{'/benchmark/config/com.bigdata.service.jini.DataServer/logicalService11/masterElection_INVALID},v{}
  response:: null



Another trace that leads to an expired session:

DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
 2009-03-30 18:35:39,478 - Processing request:: sessionid:0x120597b61370008 
type:ping cxid:0xfffe zxid:0xfffe txntype:unknown n/a
DEBUG [SyncThread:0] 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:137)
 2009-03-30 18:35:39,478 - sessionid:0x120597b61370008 type:ping 
cxid:0xfffe zxid:0xfffe txntype:unknown n/a
DEBUG [main-SendThread] 

[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-19 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683447#action_12683447
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Patrick, I did not try to coordinate the client and server logs but rather drew 
representative samples from each.  As far as I can tell it is more of the same 
in both logs.  However, I will correlate the events and the zxids and see if I 
can get that debug trace you suggested. -bryan

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread bryan thompson (JIRA)
doIO in NioServerCnxn: Exception causing close of session : cause is read 
error
-

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 
x86_64 x86_64 GNU/Linux
Reporter: bryan thompson


I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
see a lot of expired sessions.  I am using a 16 node cluster which is all on 
the same local network.  There is a single zookeeper instance (these are 
benchmarking runs).
The problem appears to be correlated with either run time or system load.\

Personally I think that it is system load because I have session session 
expired events under a Windows platform running zookeeper and the application 
(i.e., everthing is local) when the application load suddenly spikes.  To me 
this suggests that the client is not able to renew (ping) the zookeeper service 
in a timely manner and is expired.  But the log messages below with the read 
error suggest that maybe there is something else going on?

Zookeeper Configuration
#Wed Mar 18 12:41:05 GMT-05:00 2009
clientPort=2181
dataDir=/var/bigdata/benchmark/zookeeper/1
syncLimit=2
dataLogDir=/var/bigdata/benchmark/zookeeper/1
tickTime=2000

Some representative log messages are below.

Client side messages (from our app)
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode

Server side messages:
 WARN [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 
due to java.io.IOException: Read error
 WARN [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f 
due to java.io.IOException: Read error


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683235#action_12683235
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Let me clarify a few things based on the other comments:

1. The sessionTimeout for the client was set to 2.

2. The zookeeper server is running on a host with very little total load (very 
little CPU utilization and very low disk write rates).  There is only one disk 
available for the zookeeper transaction log.  It is a SAS 10k spindle with a 
16M cache.

3. The zookeeper server process has 4G of RAM.

4. The benchmark is not a zookeeper benchmark, but a database benchmark.  
Zookeeper is being used for distributed locks and master elections.  There is 
relatively little activity for the zookeeper server.

I will modify the logged message to record the zxid and report back some 
correlated events.  

I will also report the output of the stat command from the server for several 
times during the run / JXM, which I've enabled.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683236#action_12683236
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

I missed the question about the zk server.  It is an 8 core (2 quad core 
Opterons) 4x512k cache, 2.3Ghz clock with 32G ram.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.