Hi Hasitha , Can you try pointing to a sperate Cassandra ring instead of using internal ones and killing the seed cassandra nodes with the broker.
Basically 1st scenario what we need to get to work is failover with killing brokers nodes one by one. We do not need to bring Cassandra level fail over in to the picture initially. Since then we have too many variables. After we get this scenario to work we can look at Cassandra level failover. i think the best of way to go through this test can fix session is to do it iteratively in level by level. So lets 1st test broker level fail over and make sure it works. if there are issues lets fix them and then go to next level. otherwise its hard to isolate the issues. cheers, Charith On Fri, Jun 15, 2012 at 9:27 AM, Hasitha Hiranya <[email protected]> wrote: > Hi, > > I did a careful test on MB2 pack with deployment pattern "*Use inbuilt > cassandra server and zoo keeper server for all the broker nodes*". > > Following are results step by step. > > *Environment.* > > - Three machines M1,M2,M3. > - Three broker nodes BR1,BR2,BR3 (one per machine). > - Named cassandra instances in BR2,BR3 as seeds. > - JMS queue senders at M1,M2 > - JMS queue receiver at M3 > > *Test steps and results*. > > > - Sent 20 messages to cluster using client at M1 and received using > client at M3. *All messages received. No exceptions. No duplication of > messages.* > > * > * > > - Sent 20 messages to BR1, and killed it. But when BR1 is killed other > servers are seeking for zookeeper connection from that killed node with > logs printing for each attempt (this increases the log size). Now ran jms > client at M3 node. No messages were received. Management console showed 0 > messages. Thus all that 20 messages are lost. > > > > - Then started BR1 again. Exceptions from BR2,BR3 stopped. Now ran > queue receiver again. No messages received. > > > > - Killed *seed node* BR2. Others said BR2 is dead and removed from > gossip. Zookeeper leader election ran round the ring and confirmed the > leader. Here killing seed node BR2 did not create exceptions at BR3 (seed) > or BR1 (non-seed) ? No zookeeper connection refused exceptions either. > > > > - Began test again with all BR1,BR2,BR3 up.Killed BR1 and sent 20 > messages (now failover will detect BR2 is on). But now BR2 and BR3 prints > continuous connection refucred for zookeeper connection at BR1 [1]. Also > JMS client says connection refused (means BR2 or BR3 is not responding or > client does not detect that the are up) > > *Exceptions* > [1]. java.net.ConnectException: Connection refused at > sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143) > [2012-06-15 08:52:04,998] INFO {org.apache.zookeeper.ClientCnxn} - Opening > socket connection to server nodex/192.168.0.100:2181 [2012-06-15 > 08:52:05,000] WARN {org.apache.zookeeper.ClientCnxn} - Session > 0x137ee21ac480000 for server null, unexpected error, closing socket > connection and attempting reconnect > > *Issues* > > > - When queue (which is distributed) is deleted exceptions occur > - Nodeslist only shows the local node > - There is a lot of logs printed when starting as a cluster and > connection seeking. > - In order to create a binding for the queue first queue listner > should run before the queue sender which is not acceptable > > I will put some jiras. > > Thanks. > > > On Mon, Jun 11, 2012 at 5:08 PM, Hasitha Hiranya <[email protected]>wrote: > >> Hi Srinath, Charith, Shammi, >> >> I did HA tests for MB M2 pack. Results can be found at >> >> https://docs.google.com/a/wso2.com/spreadsheet/ccc?key=0Ap7HoxWKqqNndEZFRzNpNW1wOGlGckUtcUhBTzlUTkE#gid=0 >> >> Please note that test results, exceptions occurred, sent and received >> message details are at different sheets. >> >> Thanks. >> >> -- >> *Hasitha Abeykoon* >> Software Engineer; WSO2, Inc.; http://wso2.com >> *cell:* *+94 719363063* >> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>* * >> * >> * >> >> > > > -- > *Hasitha Abeykoon* > Software Engineer; WSO2, Inc.; http://wso2.com > *cell:* *+94 719363063* > *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>* * > * > * > > -- Charith Dhanushka Wickramarachchi Senior Software Engineer WSO2 Inc http://wso2.com/ http://wso2.org/ blog http://charithwiki.blogspot.com/ twitter http://twitter.com/charithwiki Mobile : 0776706568
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
