Hello again,

I made some other tests with a separated GossipServer. I downloaded JGroups 
2.2.7 from JGroups website. I installed it on another Linux box.

I changed the configuration files of my two servers so to use gossip server as 
reference server.

Configuration for legolasl host:

        <TCP
              bind_addr="legolaslg"
              loopback="false"
        />
        <TCPGOSSIP
              initial_hosts="eowyng[8903]"
              gossip_refresh_rate="20000"
              num_initial_members="2"
              up_thread="true"
              down_thread="true"
        />


Configuration for gimli host:

        <TCP
              bind_addr="gimlig"
              loopback="false"
        />
        <TCPGOSSIP
              initial_hosts="eowyng[8903]"
              gossip_refresh_rate="20000"
              num_initial_members="2"
              up_thread="true"
              down_thread="true"
        />

I still use this private network.

I ran JBoss on my two nodes. Cluster formed successfully and GossipServer 
showed both connections.

I started a client and used some screens and displayed few orders.

I then disabled network interface for private network on host legolasl.
ifconfig eth1 down

I had the following messages on gimli host:

16:50:45,707 INFO  [GossipClient] refresher task is run
16:50:45,707 INFO  [GossipClient] registering DefaultPartition : gimlig:7800 
(additional data: 18 bytes)
16:50:45,707 INFO  [GossipClient] REGISTER_REQ --> eowyng/172.21.158.36:8903
16:50:45,708 INFO  [GossipClient] refresher task done. Registered 1 items
16:51:05,717 INFO  [GossipClient] refresher task is run
16:51:05,717 INFO  [GossipClient] registering DefaultPartition : gimlig:7800 
(additional data: 18 bytes)
16:51:05,717 INFO  [GossipClient] REGISTER_REQ --> eowyng/172.21.158.36:8903
16:51:05,718 INFO  [GossipClient] refresher task done. Registered 1 items
16:51:20,467 WARN  [FD] ping_dest is null: members=[legolaslg:7800 (additional 
data: 18 bytes), gimlig:7800 (additional data: 18 bytes)], 
pingable_mbrs=[gimlig:7800 (addi
tional data: 18 bytes)], local_addr=gimlig:7800 (additional data: 18 bytes)
16:51:20,974 INFO  [DefaultPartition] Suspected member: legolaslg:7800 
(additional data: 18 bytes)
16:51:20,976 INFO  [DefaultPartition] New cluster view for partition 
DefaultPartition (id: 2, delta: -1) : [172.21.158.20:1099]
16:51:20,977 INFO  [DefaultPartition] I am (172.21.158.20:1099) received 
membershipChanged event:
16:51:20,977 INFO  [DefaultPartition] Dead members: 1 ([172.21.158.37:1099])
16:51:20,977 INFO  [DefaultPartition] New Members : 0 ([])
16:51:20,977 INFO  [DefaultPartition] All Members : 1 ([172.21.158.20:1099])
16:51:21,879 INFO  [TomcatDeployer] deploy, ctxPath=/jbossmq-httpil, 
warUrl=file:/opt/jboss-3.2.7/server/all/deploy-hasingleton/jms/jbossmq-httpil.sar/jbossmq-httpil.war/
16:51:22,499 INFO  [A] Bound to JNDI name: queue/A
16:51:22,503 INFO  [B] Bound to JNDI name: queue/B
16:51:22,506 INFO  [C] Bound to JNDI name: queue/C
16:51:22,510 INFO  [D] Bound to JNDI name: queue/D
16:51:22,513 INFO  [ex] Bound to JNDI name: queue/ex
16:51:22,550 INFO  [testTopic] Bound to JNDI name: topic/testTopic
16:51:22,553 INFO  [securedTopic] Bound to JNDI name: topic/securedTopic
16:51:22,555 INFO  [testDurableTopic] Bound to JNDI name: topic/testDurableTopic
16:51:22,559 INFO  [testQueue] Bound to JNDI name: queue/testQueue
16:51:22,625 INFO  [OILServerILService] JBossMQ OIL service available at : 
/0.0.0.0:8090
16:51:22,700 INFO  [UILServerILService] JBossMQ UIL service available at : 
/0.0.0.0:8093
16:51:22,760 INFO  [DLQ] Bound to JNDI name: queue/DLQ
16:51:25,727 INFO  [GossipClient] refresher task is run
16:51:25,727 INFO  [GossipClient] registering DefaultPartition : gimlig:7800 
(additional data: 18 bytes)
16:51:25,727 INFO  [GossipClient] REGISTER_REQ --> eowyng/172.21.158.36:8903
16:51:25,728 INFO  [GossipClient] refresher task done. Registered 1 items
16:51:36,527 INFO  [GossipClient] GET_REQ --> eowyng/172.21.158.36:8903
16:51:45,737 INFO  [GossipClient] refresher task is run
16:51:45,737 INFO  [GossipClient] registering DefaultPartition : gimlig:7800 
(additional data: 18 bytes)
16:51:45,737 INFO  [GossipClient] REGISTER_REQ --> eowyng/172.21.158.36:8903
16:51:45,738 INFO  [GossipClient] refresher task done. Registered 1 items
16:51:50,317 INFO  [GossipClient] GET_REQ --> eowyng/172.21.158.36:8903
16:52:05,747 INFO  [GossipClient] refresher task is run
16:52:05,747 INFO  [GossipClient] registering DefaultPartition : gimlig:7800 
(additional data: 18 bytes)


I had the following message on legolasl host:

16:50:26,824 INFO  [GossipClient] GET_REQ --> eowyng/172.21.158.36:8903
16:50:26,884 INFO  [GossipClient] refresher task is run
16:50:26,884 INFO  [GossipClient] registering DefaultPartition : legolaslg:7800 
(additional data: 18 bytes)
16:50:26,885 INFO  [GossipClient] REGISTER_REQ --> eowyng/172.21.158.36:8903
16:50:26,885 INFO  [GossipClient] refresher task done. Registered 1 items
16:50:29,045 INFO  [EJBHomeFactory] Searching 
com/dmv/ejb/operations/operations/facade/OperationsEJB component on 
SERVER_CONTEXT...
16:50:29,053 INFO  [EJBHomeFactory] Found 
com/dmv/ejb/operations/operations/facade/OperationsEJB component on 
SERVER_CONTEXT
16:50:44,075 INFO  [GossipClient] GET_REQ --> eowyng/172.21.158.36:8903
16:50:46,894 INFO  [GossipClient] refresher task is run
16:50:46,895 INFO  [GossipClient] registering DefaultPartition : legolaslg:7800 
(additional data: 18 bytes)
16:50:46,895 INFO  [GossipClient] REGISTER_REQ --> eowyng/172.21.158.36:8903
16:50:59,744 INFO  [GossipClient] GET_REQ --> eowyng/172.21.158.36:8903
16:53:55,895 ERROR [GossipClient] exception connecting to host eowyng:8903: 
java.net.ConnectException: Connection timed out
16:53:55,895 INFO  [GossipClient] refresher task done. Registered 1 items
16:53:55,895 INFO  [GossipClient] refresher task is run
16:53:55,895 INFO  [GossipClient] registering DefaultPartition : legolaslg:7800 
(additional data: 18 bytes)
16:53:55,895 INFO  [GossipClient] REGISTER_REQ --> eowyng/172.21.158.36:8903
16:54:08,744 ERROR [GossipClient] exception connecting to host eowyng:8903: 
java.net.ConnectException: Connection timed out
16:54:08,745 ERROR [TCPGOSSIP] [FIND_INITIAL_MBRS]: gossip client found no 
members
16:54:12,045 WARN  [FD] ping_dest is null: members=[legolaslg:7800 (additional 
data: 18 bytes), gimlig:7800 (additional data: 18 bytes)], 
pingable_mbrs=[legolaslg:7800 (a
dditional data: 18 bytes)], local_addr=legolaslg:7800 (additional data: 18 
bytes)
16:54:14,554 WARN  [FD] ping_dest is null: members=[legolaslg:7800 (additional 
data: 18 bytes), gimlig:7800 (additional data: 18 bytes)], 
pingable_mbrs=[legolaslg:7800 (a
dditional data: 18 bytes)], local_addr=legolaslg:7800 (additional data: 18 
bytes)
16:54:14,765 INFO  [DefaultPartition] Suspected member: gimlig:7800 (additional 
data: 18 bytes)
16:54:14,766 INFO  [DefaultPartition] New cluster view for partition 
DefaultPartition (id: 2, delta: -1) : [172.21.158.37:1099]
16:54:14,767 INFO  [DefaultPartition] I am (172.21.158.37:1099) received 
membershipChanged event:
16:54:14,767 INFO  [DefaultPartition] Dead members: 1 ([172.21.158.20:1099])
16:54:14,767 INFO  [DefaultPartition] New Members : 0 ([])
16:54:14,767 INFO  [DefaultPartition] All Members : 1 ([172.21.158.37:1099])
16:54:24,954 INFO  [GossipClient] GET_REQ --> eowyng/172.21.158.36:8903
16:57:04,894 ERROR [GossipClient] exception connecting to host eowyng:8903: 
java.net.ConnectException: Connection timed out
16:57:04,895 INFO  [GossipClient] refresher task done. Registered 1 items
16:57:04,895 INFO  [GossipClient] refresher task is run
16:57:04,895 INFO  [GossipClient] registering DefaultPartition : legolaslg:7800 
(additional data: 18 bytes)
16:57:04,895 INFO  [GossipClient] REGISTER_REQ --> eowyng/172.21.158.36:8903
16:57:33,954 ERROR [GossipClient] exception connecting to host eowyng:8903: 
java.net.ConnectException: Connection timed out
16:57:33,955 ERROR [TCPGOSSIP] [FIND_INITIAL_MBRS]: gossip client found no 
members
16:57:47,044 INFO  [GossipClient] GET_REQ --> eowyng/172.21.158.36:8903



Something annoys me here (to my understanding): node legolasl seems to consider 
that node gimli is faulty, rather than considering itself as faulty (legolasl 
can't ping the gossipserver).

Then, I wanted to keep on using some functions on client. After few function 
calls, I was hanging. I stopped client and restarted it. I couldn't get 
connected.
I stopped both nodes, restarted only gimli node, and then I was able to open a 
connection and work normally.

In that case, maybe if a parameter for cluster was telling to shutdown node if 
gossip server was unreachable, we could solve this problem ? Or maybe, I did 
something wrong.

Gossip server, as a third independent party, seems to be anyway the most 
appropriate solution for our problem (separate cluster node into two different 
rooms). My feeling is if the node falls completely, there won't be a problem. 
If only a network card fails, we don't have a "big enough" condition.

Thanks in advance for your help.

View the original post : 
http://www.jboss.org/index.html?module=bb&op=viewtopic&p=3866946#3866946

Reply to the post : 
http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=3866946


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
JBoss-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jboss-user

Reply via email to