[jira] Created: (ZOOKEEPER-341) regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum

2009-03-18 Thread Patrick Hunt (JIRA)
regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum
---

 Key: ZOOKEEPER-341
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-341
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Reporter: Patrick Hunt
Priority: Blocker
 Fix For: 3.1.1


ZOOKEEPER 330/336 caused a regression in QuorumPeerMain -- cannot reliably 
start a cluster due to missing tickTime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-341) regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum

2009-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-341:
---

Attachment: ZOOKEEPER-341.patch

This patch removes the shadow tickTime so that the super can be accessed.


 regression in QuorumPeerMain, tickTime from config is lost, cannot start 
 quorum
 ---

 Key: ZOOKEEPER-341
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-341
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Reporter: Patrick Hunt
Priority: Blocker
 Fix For: 3.1.1, 3.2.0

 Attachments: ZOOKEEPER-341.patch


 ZOOKEEPER 330/336 caused a regression in QuorumPeerMain -- cannot reliably 
 start a cluster due to missing tickTime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-341) regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum

2009-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-341:
---

Fix Version/s: 3.2.0

 regression in QuorumPeerMain, tickTime from config is lost, cannot start 
 quorum
 ---

 Key: ZOOKEEPER-341
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-341
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Reporter: Patrick Hunt
Priority: Blocker
 Fix For: 3.1.1, 3.2.0

 Attachments: ZOOKEEPER-341.patch


 ZOOKEEPER 330/336 caused a regression in QuorumPeerMain -- cannot reliably 
 start a cluster due to missing tickTime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-342) improve configuration code - remove static config and use java properties

2009-03-18 Thread Patrick Hunt (JIRA)
improve configuration code - remove static config and use java properties
-

 Key: ZOOKEEPER-342
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-342
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Patrick Hunt
 Fix For: 3.2.0


The current server/quorum config classes are essentially global variables. Need 
to fix configuration parsing, remove use of essentially global vars (static) 
and also cleanup the code generally.

Add tests specific to configurtion parsing.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-343) add tests that specifically verify the zkmain and qpmain classes

2009-03-18 Thread Patrick Hunt (JIRA)
add tests that specifically verify the zkmain and qpmain classes


 Key: ZOOKEEPER-343
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-343
 Project: Zookeeper
  Issue Type: Improvement
  Components: tests
Reporter: Patrick Hunt
 Fix For: 3.2.0


We are missing tests for these two main() routines.

Add tests that verify standalone and quorum (2 servers is probably enough) by 
starting and connecting a client.

Use on-disk configuration files to configure these.
(ie verify starting with actual config files)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-341) regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum

2009-03-18 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-341.
-

  Resolution: Fixed
Assignee: Patrick Hunt
Hadoop Flags: [Reviewed]

+1 ... I just committed this.

thanks pat.

 regression in QuorumPeerMain, tickTime from config is lost, cannot start 
 quorum
 ---

 Key: ZOOKEEPER-341
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-341
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Reporter: Patrick Hunt
Assignee: Patrick Hunt
Priority: Blocker
 Fix For: 3.1.1, 3.2.0

 Attachments: ZOOKEEPER-341.patch


 ZOOKEEPER 330/336 caused a regression in QuorumPeerMain -- cannot reliably 
 start a cluster due to missing tickTime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[VOTE] Release ZooKeeper 3.1.1 (candidate 1)

2009-03-18 Thread Patrick Hunt
I've created a new candidate (rc1) that fixes a regression found during 
review:

https://issues.apache.org/jira/browse/ZOOKEEPER-341
The release notes were also updated to reflect this change.

Otw there are no other changes.

*** Please download, test and VOTE before the
*** vote closes EOD on Monday March 23.***

http://people.apache.org/~phunt/zookeeper-3.1.1-candidate-1/

Should we release this?

Patrick




[jira] Assigned: (ZOOKEEPER-337) improve logging in leader election lookForLeader method when address resolution fails

2009-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-337:
--

Assignee: Patrick Hunt

 improve logging in leader election lookForLeader method when address 
 resolution fails
 -

 Key: ZOOKEEPER-337
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-337
 Project: Zookeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.2.0


 leader election has the following code:
 requestPacket.setSocketAddress(server.addr);
 LOG.info(Server address:  + server.addr);
 this should be switched to have the info log first, set sock addr second.
 The reason for this is that if the setSocketAddress fails sun is not printing 
 the address used. As a result it's verfy difficult to debug this issue.
 If we log the server address first, then if the setsockaddr fails we'll see 
 both the address of the server and the exception detail (right now we just 
 see the exception detail which does not include the invlaid address in 
 invalidaddressexception).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread bryan thompson (JIRA)
doIO in NioServerCnxn: Exception causing close of session : cause is read 
error
-

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 
x86_64 x86_64 GNU/Linux
Reporter: bryan thompson


I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
see a lot of expired sessions.  I am using a 16 node cluster which is all on 
the same local network.  There is a single zookeeper instance (these are 
benchmarking runs).
The problem appears to be correlated with either run time or system load.\

Personally I think that it is system load because I have session session 
expired events under a Windows platform running zookeeper and the application 
(i.e., everthing is local) when the application load suddenly spikes.  To me 
this suggests that the client is not able to renew (ping) the zookeeper service 
in a timely manner and is expired.  But the log messages below with the read 
error suggest that maybe there is something else going on?

Zookeeper Configuration
#Wed Mar 18 12:41:05 GMT-05:00 2009
clientPort=2181
dataDir=/var/bigdata/benchmark/zookeeper/1
syncLimit=2
dataLogDir=/var/bigdata/benchmark/zookeeper/1
tickTime=2000

Some representative log messages are below.

Client side messages (from our app)
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode

Server side messages:
 WARN [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 
due to java.io.IOException: Read error
 WARN [NIOServerCxn.Factory:2181] 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f 
due to java.io.IOException: Read error


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683158#action_12683158
 ] 

Mahadev konar edited comment on ZOOKEEPER-344 at 3/18/09 2:16 PM:
--

{noformat}
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
{noformat}

can you post  corresponding session id's with these ? 

and also the logs related to their session closing with the timestamps.

  was (Author: mahadev):
{noformat}
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
{noformat}

can you post  corresponding session id's with these? 
  
 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson

 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683158#action_12683158
 ] 

Mahadev konar edited comment on ZOOKEEPER-344 at 3/18/09 2:17 PM:
--

{noformat}
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
{noformat}

can you post  corresponding session id's with these ? 

and also the logs related to their session closing with the timestamps (on the 
server side).

  was (Author: mahadev):
{noformat}
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
ERROR [main-EventThread] 
com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
New state: Expired : 
zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
{noformat}

can you post  corresponding session id's with these ? 

and also the logs related to their session closing with the timestamps.
  
 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson

 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683181#action_12683181
 ] 

Patrick Hunt commented on ZOOKEEPER-344:


Hi Bryan, you might also try looking at some of the statistics using the stat 
command:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands
this will give you insight on the min/max/avg latency of requests. You could 
also use JMX if that works for you:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperJMX.html

What is the timeout value you are using for your ZK clients? If your max 
latency is exceeding your client
timeouts then you will definitely see expirations.

Secondly review this section, specifically related to tranaction log placement 
and jdk memory (swapping) issues:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems
Either of these issues can cause performance to dip, and latencies to increase.

This information, along with a bit more detail on your benchmark would help 
you/us identify what's causing
these issues. Re your benchmark, how many operations/sec are you running? 
What's the read/write split?

Your zk server is a single quad-core x86_64 cpu, correct?

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-344:
---

  Component/s: server
Fix Version/s: 3.2.0

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-60) Get cppunit tests running as part of Hudson CI

2009-03-18 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan reassigned ZOOKEEPER-60:
---

Assignee: Giridharan Kesavan

 Get cppunit tests running as part of Hudson CI
 --

 Key: ZOOKEEPER-60
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-60
 Project: Zookeeper
  Issue Type: Improvement
  Components: build
Reporter: Patrick Hunt
Assignee: Giridharan Kesavan

 Investigate if it is possible to run cppunit tests as part of Hudson.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683235#action_12683235
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

Let me clarify a few things based on the other comments:

1. The sessionTimeout for the client was set to 2.

2. The zookeeper server is running on a host with very little total load (very 
little CPU utilization and very low disk write rates).  There is only one disk 
available for the zookeeper transaction log.  It is a SAS 10k spindle with a 
16M cache.

3. The zookeeper server process has 4G of RAM.

4. The benchmark is not a zookeeper benchmark, but a database benchmark.  
Zookeeper is being used for distributed locks and master elections.  There is 
relatively little activity for the zookeeper server.

I will modify the logged message to record the zxid and report back some 
correlated events.  

I will also report the output of the stat command from the server for several 
times during the run / JXM, which I've enabled.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread bryan thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683236#action_12683236
 ] 

bryan thompson commented on ZOOKEEPER-344:
--

I missed the question about the zk server.  It is an 8 core (2 quad core 
Opterons) 4x512k cache, 2.3Ghz clock with 32G ram.

 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error

2009-03-18 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683327#action_12683327
 ] 

Patrick Hunt commented on ZOOKEEPER-344:


Bryan, that's good info. It doesn't sound like zk server latency is the issue 
then, you have an excess
of cpu/memory based on the tests you are running, however it will be good to 
verify using jmx or the 
stat command.

If you can run with DEBUG logging enabled (server and client) it might give you 
more insight. Also running
at DEBUG level will cause the stack of the read error you are seeing to be 
printed to the server log (zk
version 3.1). If you can share all/part of the logs please feel free to attach 
them to this JIRA.

It's probably this code in server doIO though that's causing the server side 
read error exception you are seeing:

int rc = sock.read(incomingBuffer);
if (rc  0) {
throw new IOException(Read error);
}

read returns The number of bytes read, possibly zero, or -1 if the channel has 
reached end-of-stream

this indicates to me that the client has closed the connection.

Also, looking at your logs the client log is from 13:35 while the server log is 
from 13:06, assuming that the 
clocks are even fairly close this is almost 30min difference, if true it's 
unlikely the events are correlated?

My guess is that the client is closing the connection for some reason, but it 
would be interesting to see
the debug logs (with clocks that are fairly close on server/client so it would 
be easier to correlate the log
events).

Hope this helps.


 doIO in NioServerCnxn: Exception causing close of session : cause is read 
 error
 -

 Key: ZOOKEEPER-344
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344
 Project: Zookeeper
  Issue Type: Bug
  Components: java client, server
Affects Versions: 3.1.0
 Environment: jdk1.6.0_07
 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: bryan thompson
 Fix For: 3.2.0


 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I 
 see a lot of expired sessions.  I am using a 16 node cluster which is all on 
 the same local network.  There is a single zookeeper instance (these are 
 benchmarking runs).
 The problem appears to be correlated with either run time or system load.\
 Personally I think that it is system load because I have session session 
 expired events under a Windows platform running zookeeper and the application 
 (i.e., everthing is local) when the application load suddenly spikes.  To me 
 this suggests that the client is not able to renew (ping) the zookeeper 
 service in a timely manner and is expired.  But the log messages below with 
 the read error suggest that maybe there is something else going on?
 Zookeeper Configuration
 #Wed Mar 18 12:41:05 GMT-05:00 2009
 clientPort=2181
 dataDir=/var/bigdata/benchmark/zookeeper/1
 syncLimit=2
 dataLogDir=/var/bigdata/benchmark/zookeeper/1
 tickTime=2000
 Some representative log messages are below.
 Client side messages (from our app)
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode
 ERROR [main-EventThread] 
 com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 
 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. 
 New state: Expired : 
 zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode
 Server side messages:
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:57,252 - Exception causing close of session 
 0x1201aac14300022 due to java.io.IOException: Read error
  WARN [NIOServerCxn.Factory:2181] 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 
 2009-03-18 13:06:58,198 - Exception causing close of session 
 0x1201aac143f due to java.io.IOException: Read error

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.