Another experiment:
I tried updating Jboss lib to use the latest stable JGroups 2.4.0
This time Node B shows a lightly different trace.
anonymous wrote : 2007-03-06 14:20:09,055 DEBUG
[org.jboss.ha.framework.server.ClusterPartition] Caught exception after channel
connected; closing channel -- In
| itial state transfer failed: Channel.getState() returned false
|
Looks like there was an exception in Node B, so node B sent a LEAVE request to
Node A. Here's a more detailed trace of Node B:
anonymous wrote : 2007-03-06 14:19:38,990 DEBUG [org.jgroups.protocols.FD_SOCK]
VIEW_CHANGE received: [192.168.1.100:32772, 192.168.1.105:32825]
| 2007-03-06 14:19:38,991 DEBUG [org.jgroups.protocols.FD] suspected_mbrs:
[], after adjustment: []
| 2007-03-06 14:19:38,993 DEBUG [org.jgroups.protocols.FD_SOCK]
determinePingDest()=192.168.1.100:32772, pingable_mbrs=[192.168.1.100:32772, 192
| .168.1.105:32825]
| 2007-03-06 14:19:38,994 DEBUG [org.jgroups.protocols.FD_SOCK]
ping_dest=192.168.1.100:32772, ping_sock=Socket[addr=/192.168.1.100,port=54423,l
| ocalport=47344], cache={192.168.1.100:32772=192.168.1.100:54423}
| 2007-03-06 14:19:39,047 DEBUG
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] ViewAccepted:
initial members set
| 2007-03-06 14:19:39,047 DEBUG
[org.jboss.ha.framework.server.ClusterPartition] Starting channel
| 2007-03-06 14:19:39,048 DEBUG
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] get nodeName
| 2007-03-06 14:19:39,048 DEBUG
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Get current members
| 2007-03-06 14:19:39,048 INFO
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Number of cluster
members: 2
| 2007-03-06 14:19:39,049 INFO
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Other members: 1
| 2007-03-06 14:19:39,049 INFO
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Fetching state (will
wait for 30000 milliseconds):
| 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER]
GET_STATE: asking 192.168.1.100:32772 for state
| 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER]
passing down a SUSPEND_STABLE event
| 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STABLE]
suspending message garbage collection
| 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume
task started, max_suspend_time=33000
| 2007-03-06 14:19:43,168 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:19:43,208 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:19:46,920 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:19:49,000 DEBUG [org.jgroups.util.TimeScheduler] Running task
true
| 2007-03-06 14:19:49,000 DEBUG [org.jgroups.protocols.FD] sending
are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
| 2007-03-06 14:19:49,001 DEBUG [org.jgroups.protocols.FD] received ack from
192.168.1.100:32772
| 2007-03-06 14:19:55,487 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:19:55,559 DEBUG [org.jgroups.util.TimeScheduler] Running task
true
| 2007-03-06 14:19:55,559 DEBUG [org.jgroups.protocols.FD] sending
are-you-alive msg to 192.168.1.100:32770 (own address=192.168.1.105:32823)
| 2007-03-06 14:19:55,599 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:19:55,633 DEBUG [org.jgroups.protocols.FD] received ack from
192.168.1.100:32770
| 2007-03-06 14:19:59,007 DEBUG [org.jgroups.util.TimeScheduler] Running task
true
| 2007-03-06 14:19:59,007 DEBUG [org.jgroups.protocols.FD] sending
are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
| 2007-03-06 14:19:59,008 DEBUG [org.jgroups.protocols.FD] received ack from
192.168.1.100:32772
| 2007-03-06 14:20:09,015 DEBUG [org.jgroups.util.TimeScheduler] Running task
true
| 2007-03-06 14:20:09,015 DEBUG [org.jgroups.protocols.FD] sending
are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
| 2007-03-06 14:20:09,016 DEBUG [org.jgroups.protocols.FD] received ack from
192.168.1.100:32772
| 2007-03-06 14:20:09,055 DEBUG
[org.jboss.ha.framework.server.ClusterPartition] Caught exception after channel
connected; closing channel -- In
| itial state transfer failed: Channel.getState() returned false
| 2007-03-06 14:20:09,056 DEBUG [org.jgroups.protocols.pbcast.STABLE]
resuming message garbage collection
| 2007-03-06 14:20:09,056 DEBUG [org.jgroups.protocols.pbcast.GMS] sending
LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
| 2007-03-06 14:20:09,113 DEBUG [org.jgroups.protocols.pbcast.GMS]
view=[192.168.1.100:32772|2] [192.168.1.100:32772]
| 2007-03-06 14:20:09,113 DEBUG [org.jgroups.protocols.pbcast.GMS]
view=[192.168.1.100:32772|2] [192.168.1.100:32772]
| 2007-03-06 14:20:13,538 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:20:13,578 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:20:14,062 DEBUG [org.jgroups.protocols.pbcast.GMS] sending
LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
| 2007-03-06 14:20:14,371 DEBUG [org.jgroups.util.TimeScheduler] Running task
6-6
| 2007-03-06 14:20:14,978 DEBUG [org.jgroups.util.TimeScheduler] Running task
6-6
| 2007-03-06 14:20:15,494 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:20:15,566 DEBUG [org.jgroups.util.TimeScheduler] Running task
true
| 2007-03-06 14:20:15,566 DEBUG [org.jgroups.protocols.FD] sending
are-you-alive msg to 192.168.1.100:32770 (own address=192.168.1.105:32823)
| 2007-03-06 14:20:15,602 DEBUG [org.jgroups.util.TimeScheduler] Running task
[EMAIL PROTECTED]
| 2007-03-06 14:20:15,635 DEBUG [org.jgroups.protocols.FD] received ack from
192.168.1.100:32770
| 2007-03-06 14:20:16,186 DEBUG [org.jgroups.util.TimeScheduler] Running task
6-6
| 2007-03-06 14:20:18,594 DEBUG [org.jgroups.util.TimeScheduler] Running task
6-6
| 2007-03-06 14:20:19,022 DEBUG [org.jgroups.util.TimeScheduler] Running task
true
| 2007-03-06 14:20:19,023 DEBUG [org.jgroups.protocols.FD] sending
are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
| 2007-03-06 14:20:19,024 DEBUG [org.jgroups.protocols.FD] received ack from
192.168.1.100:32772
| 2007-03-06 14:20:19,070 DEBUG [org.jgroups.protocols.pbcast.GMS] sending
LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
| 2007-03-06 14:20:19,378 DEBUG [org.jgroups.util.TimeScheduler] Running task
7-7
| 2007-03-06 14:20:19,986 DEBUG [org.jgroups.util.TimeScheduler] Running task
7-7
| 2007-03-06 14:20:21,194 DEBUG [org.jgroups.util.TimeScheduler] Running task
7-7
| 2007-03-06 14:20:23,402 DEBUG [org.jgroups.util.TimeScheduler] Running task
6-6
| 2007-03-06 14:20:23,602 DEBUG [org.jgroups.util.TimeScheduler] Running task
7-7
| 2007-03-06 14:20:24,074 DEBUG [org.jgroups.protocols.pbcast.GMS]
192.168.1.105:32825 changed role to org.jgroups.protocols.pbcast.ClientGmsImp
| l
| 2007-03-06 14:20:24,078 DEBUG [org.jgroups.protocols.FD_SOCK] socket to
192.168.1.100:32772 was reset
| 2007-03-06 14:20:24,078 DEBUG [org.jgroups.protocols.FD_SOCK] pinger thread
terminated
| 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] closing sockets
and stopping threads
| 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast receive
socket closed
| 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast send
socket closed
| 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast thread
terminated
| 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] socket closed
| 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] unicast receiver
socket is closed, exception=java.net.SocketException: Socket closed
| 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] unicast receiver
thread terminated
| 2007-03-06 14:20:24,082 DEBUG
[org.jboss.ha.framework.server.ClusterPartition] Starting failed
jboss:service=wubrothers
| java.lang.IllegalStateException: Initial state transfer failed:
Channel.getState() returned false
| at
org.jboss.ha.framework.server.HAPartitionImpl.fetchState(HAPartitionImpl.java:351)
|
Node A still shows no exceptions. Here's the trace for node A that received the
LEAVE request:
anonymous wrote : 2007-03-06 14:04:45,864 DEBUG [org.jgroups.protocols.FD]
received ack from 192.168.1.105:32825
| 2007-03-06 14:04:45,920 DEBUG [org.jgroups.protocols.pbcast.GMS] received
LEAVE_REQ for 192.168.1.105:32825 from 192.168.1.105:32825
| 2007-03-06 14:04:45,973 DEBUG [org.jgroups.protocols.pbcast.GMS] new=[],
suspected=[], leaving=[192.168.1.105:32825], new view: [192.168.1.100
| :32772|2] [192.168.1.100:32772]
| 2007-03-06 14:04:45,975 DEBUG [org.jgroups.protocols.pbcast.GMS]
view=[192.168.1.100:32772|2] [192.168.1.100:32772]
| 2007-03-06 14:04:45,975 DEBUG [org.jgroups.protocols.pbcast.GMS]
[local_addr=192.168.1.100:32772] view is [192.168.1.100:32772|2] [192.168.1.1
| 00:32772]
|
Any other ideas?
Phil
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4025545#4025545
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4025545
_______________________________________________
jboss-user mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/jboss-user