Re: [Dev] High Availability Tests - MB

Charith Wickramarachchi Thu, 14 Jun 2012 21:31:18 -0700

I'd like to suggest following breakdown of the test scenarios we need to do
in following order


*Broker level*

1) MB Cluster pointing to a single cassandra and zk node -- Test for broker
fail over when broker nodes are down

2) MB Cluster pointing to a cassandra cluster and a zk node -- Test for
broker fail over when broker nodes are down

3) MB Cluster pointing to a cassandra cluster and zk cluster node -- Test
for broker fail over when broker nodes are down


*Cassandra/Zk level *
*
*
4) MB Cluster pointing to a  cassandra cluster and zk cluster  -- Test for
fail over when cassandra nodes are down (use correct replication factors)

5) MB Cluster pointing to a  cassandra cluster and zk cluster  -- Test for
fail over when zk nodes are down (use correct replication factors , test
for new queue creations too )


*Hybrid*

6) MB cluster created with internal Cassandra and zk clusters - Test for
broker fail over (which means when broker is down internal zk and cassandra
nodes will also be not available )

cheers,
Charith


On Fri, Jun 15, 2012 at 9:39 AM, Charith Wickramarachchi
<[email protected]>wrote:

> Hi Hasitha ,
>
> Can you try pointing to a sperate Cassandra ring instead of using internal
> ones and killing the seed cassandra nodes with the broker.
>
> Basically 1st scenario what we need to get to work is failover with
> killing brokers nodes one by one. We do not need to bring Cassandra
> level fail over in to the picture initially. Since then we have too many
> variables. After we get this scenario to work we can look
> at Cassandra level failover.
>
> i think the best of way to go through this test can fix session is to do
> it iteratively in level by level. So lets 1st test broker level fail over
> and make sure it works. if there are issues lets fix them and then go to
> next level. otherwise its hard to isolate the issues.
>
> cheers,
> Charith
>
> On Fri, Jun 15, 2012 at 9:27 AM, Hasitha Hiranya <[email protected]>wrote:
>
>> Hi,
>>
>> I did a careful test on MB2 pack with deployment pattern "*Use inbuilt
>> cassandra server and zoo keeper server for all the broker nodes*".
>>
>> Following are results step by step.
>>
>> *Environment.*
>>
>>    - Three machines M1,M2,M3.
>>    - Three broker nodes BR1,BR2,BR3 (one per machine).
>>    - Named cassandra instances in BR2,BR3 as seeds.
>>    - JMS queue senders at M1,M2
>>    - JMS queue receiver at M3
>>
>> *Test steps and results*.
>>
>>
>>    - Sent 20 messages to cluster using client at M1 and received using
>>    client at M3. *All messages received. No exceptions. No duplication
>>    of messages.*
>>
>> *
>> *
>>
>>    - Sent 20 messages to BR1, and killed it. But when BR1 is killed
>>    other servers are seeking for zookeeper connection from that killed node
>>    with logs printing for each attempt (this increases the log size). Now ran
>>    jms client at M3 node. No messages were received. Management console 
>> showed
>>    0 messages. Thus all that 20 messages are lost.
>>
>>
>>
>>    - Then started BR1 again. Exceptions from BR2,BR3 stopped. Now ran
>>       queue receiver again. No messages received.
>>
>>
>>
>>    - Killed *seed node* BR2. Others said BR2 is dead and removed from
>>    gossip. Zookeeper leader election ran round the ring and confirmed the
>>    leader. Here killing seed node BR2 did not create exceptions at BR3 (seed)
>>    or BR1 (non-seed) ? No zookeeper connection refused exceptions either.
>>
>>
>>
>>    - Began test again with all BR1,BR2,BR3 up.Killed BR1 and sent 20
>>    messages (now failover will detect BR2 is on). But now BR2 and BR3 prints
>>    continuous connection refucred for zookeeper connection at BR1 [1]. Also
>>    JMS client says connection refused (means BR2 or BR3 is not responding or
>>    client does not detect that the are up)
>>
>> *Exceptions*
>> [1]. java.net.ConnectException: Connection refused at
>> sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
>> [2012-06-15 08:52:04,998] INFO {org.apache.zookeeper.ClientCnxn} - Opening
>> socket connection to server nodex/192.168.0.100:2181 [2012-06-15
>> 08:52:05,000] WARN {org.apache.zookeeper.ClientCnxn} - Session
>> 0x137ee21ac480000 for server null, unexpected error, closing socket
>> connection and attempting reconnect
>>
>> *Issues*
>>
>>
>>    - When queue (which is distributed) is deleted exceptions occur
>>    - Nodeslist only shows the local node
>>    - There is a lot of logs printed when starting as a cluster and
>>    connection seeking.
>>    - In order to create a binding for the queue first queue listner
>>    should run before the queue sender which is not acceptable
>>
>> I will put some jiras.
>>
>> Thanks.
>>
>>
>> On Mon, Jun 11, 2012 at 5:08 PM, Hasitha Hiranya <[email protected]>wrote:
>>
>>> Hi Srinath, Charith, Shammi,
>>>
>>> I did HA tests for MB M2 pack. Results can be found at
>>>
>>> https://docs.google.com/a/wso2.com/spreadsheet/ccc?key=0Ap7HoxWKqqNndEZFRzNpNW1wOGlGckUtcUhBTzlUTkE#gid=0
>>>
>>> Please note that test results, exceptions occurred, sent and received
>>> message details are at different sheets.
>>>
>>> Thanks.
>>>
>>> --
>>> *Hasitha Abeykoon*
>>> Software Engineer; WSO2, Inc.; http://wso2.com
>>> *cell:* *+94 719363063*
>>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>* *
>>> *
>>> *
>>>
>>>
>>
>>
>> --
>> *Hasitha Abeykoon*
>> Software Engineer; WSO2, Inc.; http://wso2.com
>> *cell:* *+94 719363063*
>> *blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>* *
>> *
>> *
>>
>>
>
>
> --
> Charith Dhanushka Wickramarachchi
> Senior Software Engineer
> WSO2 Inc
> http://wso2.com/
> http://wso2.org/
>
> blog
> http://charithwiki.blogspot.com/
>
> twitter
> http://twitter.com/charithwiki
>
> Mobile : 0776706568
>
>
>


-- 
Charith Dhanushka Wickramarachchi
Senior Software Engineer
WSO2 Inc
http://wso2.com/
http://wso2.org/

blog
http://charithwiki.blogspot.com/

twitter
http://twitter.com/charithwiki

Mobile : 0776706568

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] High Availability Tests - MB

Reply via email to