I'd like to suggest following breakdown of the test scenarios we need to do in following order
*Broker level* 1) MB Cluster pointing to a single cassandra and zk node -- Test for broker fail over when broker nodes are down 2) MB Cluster pointing to a cassandra cluster and a zk node -- Test for broker fail over when broker nodes are down 3) MB Cluster pointing to a cassandra cluster and zk cluster node -- Test for broker fail over when broker nodes are down *Cassandra/Zk level * * * 4) MB Cluster pointing to a cassandra cluster and zk cluster -- Test for fail over when cassandra nodes are down (use correct replication factors) 5) MB Cluster pointing to a cassandra cluster and zk cluster -- Test for fail over when zk nodes are down (use correct replication factors , test for new queue creations too ) *Hybrid* 6) MB cluster created with internal Cassandra and zk clusters - Test for broker fail over (which means when broker is down internal zk and cassandra nodes will also be not available ) cheers, Charith On Fri, Jun 15, 2012 at 9:39 AM, Charith Wickramarachchi <[email protected]>wrote: > Hi Hasitha , > > Can you try pointing to a sperate Cassandra ring instead of using internal > ones and killing the seed cassandra nodes with the broker. > > Basically 1st scenario what we need to get to work is failover with > killing brokers nodes one by one. We do not need to bring Cassandra > level fail over in to the picture initially. Since then we have too many > variables. After we get this scenario to work we can look > at Cassandra level failover. > > i think the best of way to go through this test can fix session is to do > it iteratively in level by level. So lets 1st test broker level fail over > and make sure it works. if there are issues lets fix them and then go to > next level. otherwise its hard to isolate the issues. > > cheers, > Charith > > On Fri, Jun 15, 2012 at 9:27 AM, Hasitha Hiranya <[email protected]>wrote: > >> Hi, >> >> I did a careful test on MB2 pack with deployment pattern "*Use inbuilt >> cassandra server and zoo keeper server for all the broker nodes*". >> >> Following are results step by step. >> >> *Environment.* >> >> - Three machines M1,M2,M3. >> - Three broker nodes BR1,BR2,BR3 (one per machine). >> - Named cassandra instances in BR2,BR3 as seeds. >> - JMS queue senders at M1,M2 >> - JMS queue receiver at M3 >> >> *Test steps and results*. >> >> >> - Sent 20 messages to cluster using client at M1 and received using >> client at M3. *All messages received. No exceptions. No duplication >> of messages.* >> >> * >> * >> >> - Sent 20 messages to BR1, and killed it. But when BR1 is killed >> other servers are seeking for zookeeper connection from that killed node >> with logs printing for each attempt (this increases the log size). Now ran >> jms client at M3 node. No messages were received. Management console >> showed >> 0 messages. Thus all that 20 messages are lost. >> >> >> >> - Then started BR1 again. Exceptions from BR2,BR3 stopped. Now ran >> queue receiver again. No messages received. >> >> >> >> - Killed *seed node* BR2. Others said BR2 is dead and removed from >> gossip. Zookeeper leader election ran round the ring and confirmed the >> leader. Here killing seed node BR2 did not create exceptions at BR3 (seed) >> or BR1 (non-seed) ? No zookeeper connection refused exceptions either. >> >> >> >> - Began test again with all BR1,BR2,BR3 up.Killed BR1 and sent 20 >> messages (now failover will detect BR2 is on). But now BR2 and BR3 prints >> continuous connection refucred for zookeeper connection at BR1 [1]. Also >> JMS client says connection refused (means BR2 or BR3 is not responding or >> client does not detect that the are up) >> >> *Exceptions* >> [1]. java.net.ConnectException: Connection refused at >> sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) >> [2012-06-15 08:52:04,998] INFO {org.apache.zookeeper.ClientCnxn} - Opening >> socket connection to server nodex/192.168.0.100:2181 [2012-06-15 >> 08:52:05,000] WARN {org.apache.zookeeper.ClientCnxn} - Session >> 0x137ee21ac480000 for server null, unexpected error, closing socket >> connection and attempting reconnect >> >> *Issues* >> >> >> - When queue (which is distributed) is deleted exceptions occur >> - Nodeslist only shows the local node >> - There is a lot of logs printed when starting as a cluster and >> connection seeking. >> - In order to create a binding for the queue first queue listner >> should run before the queue sender which is not acceptable >> >> I will put some jiras. >> >> Thanks. >> >> >> On Mon, Jun 11, 2012 at 5:08 PM, Hasitha Hiranya <[email protected]>wrote: >> >>> Hi Srinath, Charith, Shammi, >>> >>> I did HA tests for MB M2 pack. Results can be found at >>> >>> https://docs.google.com/a/wso2.com/spreadsheet/ccc?key=0Ap7HoxWKqqNndEZFRzNpNW1wOGlGckUtcUhBTzlUTkE#gid=0 >>> >>> Please note that test results, exceptions occurred, sent and received >>> message details are at different sheets. >>> >>> Thanks. >>> >>> -- >>> *Hasitha Abeykoon* >>> Software Engineer; WSO2, Inc.; http://wso2.com >>> *cell:* *+94 719363063* >>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>* * >>> * >>> * >>> >>> >> >> >> -- >> *Hasitha Abeykoon* >> Software Engineer; WSO2, Inc.; http://wso2.com >> *cell:* *+94 719363063* >> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>* * >> * >> * >> >> > > > -- > Charith Dhanushka Wickramarachchi > Senior Software Engineer > WSO2 Inc > http://wso2.com/ > http://wso2.org/ > > blog > http://charithwiki.blogspot.com/ > > twitter > http://twitter.com/charithwiki > > Mobile : 0776706568 > > > -- Charith Dhanushka Wickramarachchi Senior Software Engineer WSO2 Inc http://wso2.com/ http://wso2.org/ blog http://charithwiki.blogspot.com/ twitter http://twitter.com/charithwiki Mobile : 0776706568
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
