[jira] Created: (ZOOKEEPER-341) regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum
regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum --- Key: ZOOKEEPER-341 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-341 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Patrick Hunt Priority: Blocker Fix For: 3.1.1 ZOOKEEPER 330/336 caused a regression in QuorumPeerMain -- cannot reliably start a cluster due to missing tickTime. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-341) regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-341: --- Attachment: ZOOKEEPER-341.patch This patch removes the shadow tickTime so that the super can be accessed. regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum --- Key: ZOOKEEPER-341 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-341 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Patrick Hunt Priority: Blocker Fix For: 3.1.1, 3.2.0 Attachments: ZOOKEEPER-341.patch ZOOKEEPER 330/336 caused a regression in QuorumPeerMain -- cannot reliably start a cluster due to missing tickTime. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-341) regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-341: --- Fix Version/s: 3.2.0 regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum --- Key: ZOOKEEPER-341 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-341 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Patrick Hunt Priority: Blocker Fix For: 3.1.1, 3.2.0 Attachments: ZOOKEEPER-341.patch ZOOKEEPER 330/336 caused a regression in QuorumPeerMain -- cannot reliably start a cluster due to missing tickTime. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-342) improve configuration code - remove static config and use java properties
improve configuration code - remove static config and use java properties - Key: ZOOKEEPER-342 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-342 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Patrick Hunt Fix For: 3.2.0 The current server/quorum config classes are essentially global variables. Need to fix configuration parsing, remove use of essentially global vars (static) and also cleanup the code generally. Add tests specific to configurtion parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-343) add tests that specifically verify the zkmain and qpmain classes
add tests that specifically verify the zkmain and qpmain classes Key: ZOOKEEPER-343 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-343 Project: Zookeeper Issue Type: Improvement Components: tests Reporter: Patrick Hunt Fix For: 3.2.0 We are missing tests for these two main() routines. Add tests that verify standalone and quorum (2 servers is probably enough) by starting and connecting a client. Use on-disk configuration files to configure these. (ie verify starting with actual config files) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-341) regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum
[ https://issues.apache.org/jira/browse/ZOOKEEPER-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar resolved ZOOKEEPER-341. - Resolution: Fixed Assignee: Patrick Hunt Hadoop Flags: [Reviewed] +1 ... I just committed this. thanks pat. regression in QuorumPeerMain, tickTime from config is lost, cannot start quorum --- Key: ZOOKEEPER-341 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-341 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Patrick Hunt Assignee: Patrick Hunt Priority: Blocker Fix For: 3.1.1, 3.2.0 Attachments: ZOOKEEPER-341.patch ZOOKEEPER 330/336 caused a regression in QuorumPeerMain -- cannot reliably start a cluster due to missing tickTime. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[VOTE] Release ZooKeeper 3.1.1 (candidate 1)
I've created a new candidate (rc1) that fixes a regression found during review: https://issues.apache.org/jira/browse/ZOOKEEPER-341 The release notes were also updated to reflect this change. Otw there are no other changes. *** Please download, test and VOTE before the *** vote closes EOD on Monday March 23.*** http://people.apache.org/~phunt/zookeeper-3.1.1-candidate-1/ Should we release this? Patrick
[jira] Assigned: (ZOOKEEPER-337) improve logging in leader election lookForLeader method when address resolution fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-337: -- Assignee: Patrick Hunt improve logging in leader election lookForLeader method when address resolution fails - Key: ZOOKEEPER-337 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-337 Project: Zookeeper Issue Type: Improvement Components: quorum Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.2.0 leader election has the following code: requestPacket.setSocketAddress(server.addr); LOG.info(Server address: + server.addr); this should be switched to have the info log first, set sock addr second. The reason for this is that if the setSocketAddress fails sun is not printing the address used. As a result it's verfy difficult to debug this issue. If we log the server address first, then if the setsockaddr fails we'll see both the address of the server and the exception detail (right now we just see the exception detail which does not include the invlaid address in invalidaddressexception). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error
doIO in NioServerCnxn: Exception causing close of session : cause is read error - Key: ZOOKEEPER-344 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.1.0 Environment: jdk1.6.0_07 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: bryan thompson I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot of expired sessions. I am using a 16 node cluster which is all on the same local network. There is a single zookeeper instance (these are benchmarking runs). The problem appears to be correlated with either run time or system load.\ Personally I think that it is system load because I have session session expired events under a Windows platform running zookeeper and the application (i.e., everthing is local) when the application load suddenly spikes. To me this suggests that the client is not able to renew (ping) the zookeeper service in a timely manner and is expired. But the log messages below with the read error suggest that maybe there is something else going on? Zookeeper Configuration #Wed Mar 18 12:41:05 GMT-05:00 2009 clientPort=2181 dataDir=/var/bigdata/benchmark/zookeeper/1 syncLimit=2 dataLogDir=/var/bigdata/benchmark/zookeeper/1 tickTime=2000 Some representative log messages are below. Client side messages (from our app) ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode Server side messages: WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException: Read error WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683158#action_12683158 ] Mahadev konar edited comment on ZOOKEEPER-344 at 3/18/09 2:16 PM: -- {noformat} ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode {noformat} can you post corresponding session id's with these ? and also the logs related to their session closing with the timestamps. was (Author: mahadev): {noformat} ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode {noformat} can you post corresponding session id's with these? doIO in NioServerCnxn: Exception causing close of session : cause is read error - Key: ZOOKEEPER-344 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.1.0 Environment: jdk1.6.0_07 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: bryan thompson I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot of expired sessions. I am using a 16 node cluster which is all on the same local network. There is a single zookeeper instance (these are benchmarking runs). The problem appears to be correlated with either run time or system load.\ Personally I think that it is system load because I have session session expired events under a Windows platform running zookeeper and the application (i.e., everthing is local) when the application load suddenly spikes. To me this suggests that the client is not able to renew (ping) the zookeeper service in a timely manner and is expired. But the log messages below with the read error suggest that maybe there is something else going on? Zookeeper Configuration #Wed Mar 18 12:41:05 GMT-05:00 2009 clientPort=2181 dataDir=/var/bigdata/benchmark/zookeeper/1 syncLimit=2 dataLogDir=/var/bigdata/benchmark/zookeeper/1 tickTime=2000 Some representative log messages are below. Client side messages (from our app) ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode Server side messages: WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException: Read error WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683158#action_12683158 ] Mahadev konar edited comment on ZOOKEEPER-344 at 3/18/09 2:17 PM: -- {noformat} ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode {noformat} can you post corresponding session id's with these ? and also the logs related to their session closing with the timestamps (on the server side). was (Author: mahadev): {noformat} ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode {noformat} can you post corresponding session id's with these ? and also the logs related to their session closing with the timestamps. doIO in NioServerCnxn: Exception causing close of session : cause is read error - Key: ZOOKEEPER-344 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.1.0 Environment: jdk1.6.0_07 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: bryan thompson I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot of expired sessions. I am using a 16 node cluster which is all on the same local network. There is a single zookeeper instance (these are benchmarking runs). The problem appears to be correlated with either run time or system load.\ Personally I think that it is system load because I have session session expired events under a Windows platform running zookeeper and the application (i.e., everthing is local) when the application load suddenly spikes. To me this suggests that the client is not able to renew (ping) the zookeeper service in a timely manner and is expired. But the log messages below with the read error suggest that maybe there is something else going on? Zookeeper Configuration #Wed Mar 18 12:41:05 GMT-05:00 2009 clientPort=2181 dataDir=/var/bigdata/benchmark/zookeeper/1 syncLimit=2 dataLogDir=/var/bigdata/benchmark/zookeeper/1 tickTime=2000 Some representative log messages are below. Client side messages (from our app) ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode Server side messages: WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException: Read error WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683181#action_12683181 ] Patrick Hunt commented on ZOOKEEPER-344: Hi Bryan, you might also try looking at some of the statistics using the stat command: http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands this will give you insight on the min/max/avg latency of requests. You could also use JMX if that works for you: http://hadoop.apache.org/zookeeper/docs/current/zookeeperJMX.html What is the timeout value you are using for your ZK clients? If your max latency is exceeding your client timeouts then you will definitely see expirations. Secondly review this section, specifically related to tranaction log placement and jdk memory (swapping) issues: http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems Either of these issues can cause performance to dip, and latencies to increase. This information, along with a bit more detail on your benchmark would help you/us identify what's causing these issues. Re your benchmark, how many operations/sec are you running? What's the read/write split? Your zk server is a single quad-core x86_64 cpu, correct? doIO in NioServerCnxn: Exception causing close of session : cause is read error - Key: ZOOKEEPER-344 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 Project: Zookeeper Issue Type: Bug Components: java client, server Affects Versions: 3.1.0 Environment: jdk1.6.0_07 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: bryan thompson Fix For: 3.2.0 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot of expired sessions. I am using a 16 node cluster which is all on the same local network. There is a single zookeeper instance (these are benchmarking runs). The problem appears to be correlated with either run time or system load.\ Personally I think that it is system load because I have session session expired events under a Windows platform running zookeeper and the application (i.e., everthing is local) when the application load suddenly spikes. To me this suggests that the client is not able to renew (ping) the zookeeper service in a timely manner and is expired. But the log messages below with the read error suggest that maybe there is something else going on? Zookeeper Configuration #Wed Mar 18 12:41:05 GMT-05:00 2009 clientPort=2181 dataDir=/var/bigdata/benchmark/zookeeper/1 syncLimit=2 dataLogDir=/var/bigdata/benchmark/zookeeper/1 tickTime=2000 Some representative log messages are below. Client side messages (from our app) ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode Server side messages: WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException: Read error WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-344: --- Component/s: server Fix Version/s: 3.2.0 doIO in NioServerCnxn: Exception causing close of session : cause is read error - Key: ZOOKEEPER-344 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 Project: Zookeeper Issue Type: Bug Components: java client, server Affects Versions: 3.1.0 Environment: jdk1.6.0_07 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: bryan thompson Fix For: 3.2.0 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot of expired sessions. I am using a 16 node cluster which is all on the same local network. There is a single zookeeper instance (these are benchmarking runs). The problem appears to be correlated with either run time or system load.\ Personally I think that it is system load because I have session session expired events under a Windows platform running zookeeper and the application (i.e., everthing is local) when the application load suddenly spikes. To me this suggests that the client is not able to renew (ping) the zookeeper service in a timely manner and is expired. But the log messages below with the read error suggest that maybe there is something else going on? Zookeeper Configuration #Wed Mar 18 12:41:05 GMT-05:00 2009 clientPort=2181 dataDir=/var/bigdata/benchmark/zookeeper/1 syncLimit=2 dataLogDir=/var/bigdata/benchmark/zookeeper/1 tickTime=2000 Some representative log messages are below. Client side messages (from our app) ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode Server side messages: WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException: Read error WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-60) Get cppunit tests running as part of Hudson CI
[ https://issues.apache.org/jira/browse/ZOOKEEPER-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan reassigned ZOOKEEPER-60: --- Assignee: Giridharan Kesavan Get cppunit tests running as part of Hudson CI -- Key: ZOOKEEPER-60 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-60 Project: Zookeeper Issue Type: Improvement Components: build Reporter: Patrick Hunt Assignee: Giridharan Kesavan Investigate if it is possible to run cppunit tests as part of Hudson. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683235#action_12683235 ] bryan thompson commented on ZOOKEEPER-344: -- Let me clarify a few things based on the other comments: 1. The sessionTimeout for the client was set to 2. 2. The zookeeper server is running on a host with very little total load (very little CPU utilization and very low disk write rates). There is only one disk available for the zookeeper transaction log. It is a SAS 10k spindle with a 16M cache. 3. The zookeeper server process has 4G of RAM. 4. The benchmark is not a zookeeper benchmark, but a database benchmark. Zookeeper is being used for distributed locks and master elections. There is relatively little activity for the zookeeper server. I will modify the logged message to record the zxid and report back some correlated events. I will also report the output of the stat command from the server for several times during the run / JXM, which I've enabled. doIO in NioServerCnxn: Exception causing close of session : cause is read error - Key: ZOOKEEPER-344 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 Project: Zookeeper Issue Type: Bug Components: java client, server Affects Versions: 3.1.0 Environment: jdk1.6.0_07 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: bryan thompson Fix For: 3.2.0 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot of expired sessions. I am using a 16 node cluster which is all on the same local network. There is a single zookeeper instance (these are benchmarking runs). The problem appears to be correlated with either run time or system load.\ Personally I think that it is system load because I have session session expired events under a Windows platform running zookeeper and the application (i.e., everthing is local) when the application load suddenly spikes. To me this suggests that the client is not able to renew (ping) the zookeeper service in a timely manner and is expired. But the log messages below with the read error suggest that maybe there is something else going on? Zookeeper Configuration #Wed Mar 18 12:41:05 GMT-05:00 2009 clientPort=2181 dataDir=/var/bigdata/benchmark/zookeeper/1 syncLimit=2 dataLogDir=/var/bigdata/benchmark/zookeeper/1 tickTime=2000 Some representative log messages are below. Client side messages (from our app) ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode Server side messages: WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException: Read error WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683236#action_12683236 ] bryan thompson commented on ZOOKEEPER-344: -- I missed the question about the zk server. It is an 8 core (2 quad core Opterons) 4x512k cache, 2.3Ghz clock with 32G ram. doIO in NioServerCnxn: Exception causing close of session : cause is read error - Key: ZOOKEEPER-344 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 Project: Zookeeper Issue Type: Bug Components: java client, server Affects Versions: 3.1.0 Environment: jdk1.6.0_07 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: bryan thompson Fix For: 3.2.0 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot of expired sessions. I am using a 16 node cluster which is all on the same local network. There is a single zookeeper instance (these are benchmarking runs). The problem appears to be correlated with either run time or system load.\ Personally I think that it is system load because I have session session expired events under a Windows platform running zookeeper and the application (i.e., everthing is local) when the application load suddenly spikes. To me this suggests that the client is not able to renew (ping) the zookeeper service in a timely manner and is expired. But the log messages below with the read error suggest that maybe there is something else going on? Zookeeper Configuration #Wed Mar 18 12:41:05 GMT-05:00 2009 clientPort=2181 dataDir=/var/bigdata/benchmark/zookeeper/1 syncLimit=2 dataLogDir=/var/bigdata/benchmark/zookeeper/1 tickTime=2000 Some representative log messages are below. Client side messages (from our app) ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode Server side messages: WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException: Read error WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-344) doIO in NioServerCnxn: Exception causing close of session : cause is read error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683327#action_12683327 ] Patrick Hunt commented on ZOOKEEPER-344: Bryan, that's good info. It doesn't sound like zk server latency is the issue then, you have an excess of cpu/memory based on the tests you are running, however it will be good to verify using jmx or the stat command. If you can run with DEBUG logging enabled (server and client) it might give you more insight. Also running at DEBUG level will cause the stack of the read error you are seeing to be printed to the server log (zk version 3.1). If you can share all/part of the logs please feel free to attach them to this JIRA. It's probably this code in server doIO though that's causing the server side read error exception you are seeing: int rc = sock.read(incomingBuffer); if (rc 0) { throw new IOException(Read error); } read returns The number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream this indicates to me that the client has closed the connection. Also, looking at your logs the client log is from 13:35 while the server log is from 13:06, assuming that the clocks are even fairly close this is almost 30min difference, if true it's unlikely the events are correlated? My guess is that the client is closing the connection for some reason, but it would be interesting to see the debug logs (with clocks that are fairly close on server/client so it would be easier to correlate the log events). Hope this helps. doIO in NioServerCnxn: Exception causing close of session : cause is read error - Key: ZOOKEEPER-344 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-344 Project: Zookeeper Issue Type: Bug Components: java client, server Affects Versions: 3.1.0 Environment: jdk1.6.0_07 Linux blade2 2.6.27.7-134.fc10.x86_64 #1 SMP Mon Dec 1 22:21:35 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Reporter: bryan thompson Fix For: 3.2.0 I have been having a problem with zookeeper 3.0.1 and now with 3.1.0 where I see a lot of expired sessions. I am using a 16 node cluster which is all on the same local network. There is a single zookeeper instance (these are benchmarking runs). The problem appears to be correlated with either run time or system load.\ Personally I think that it is system load because I have session session expired events under a Windows platform running zookeeper and the application (i.e., everthing is local) when the application load suddenly spikes. To me this suggests that the client is not able to renew (ping) the zookeeper service in a timely manner and is expired. But the log messages below with the read error suggest that maybe there is something else going on? Zookeeper Configuration #Wed Mar 18 12:41:05 GMT-05:00 2009 clientPort=2181 dataDir=/var/bigdata/benchmark/zookeeper/1 syncLimit=2 dataLogDir=/var/bigdata/benchmark/zookeeper/1 tickTime=2000 Some representative log messages are below. Client side messages (from our app) ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1160/locknode ERROR [main-EventThread] com.bigdata.zookeeper.ZLockImpl$ZLockWatcher.process(ZLockImpl.java:400) 2009-03-18 13:35:40,335 - Session expired: WatchedEvent: Server state change. New state: Expired : zpath=/benchmark/jobs/com.bigdata.service.jini.benchmark.ThroughputMaster/test_1/client1356/locknode Server side messages: WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:57,252 - Exception causing close of session 0x1201aac14300022 due to java.io.IOException: Read error WARN [NIOServerCxn.Factory:2181] org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:417) 2009-03-18 13:06:58,198 - Exception causing close of session 0x1201aac143f due to java.io.IOException: Read error -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.