Another experiment:

I tried updating Jboss lib to use the latest stable JGroups 2.4.0

This time Node B shows a lightly different trace. 

anonymous wrote : 2007-03-06 14:20:09,055 DEBUG 
[org.jboss.ha.framework.server.ClusterPartition] Caught exception after channel 
connected; closing channel -- In
  | itial state transfer failed: Channel.getState() returned false
  | 

Looks like there was an exception in Node B, so node B sent a LEAVE request to 
Node A. Here's a more detailed trace of Node B:

anonymous wrote : 2007-03-06 14:19:38,990 DEBUG [org.jgroups.protocols.FD_SOCK] 
VIEW_CHANGE received: [192.168.1.100:32772, 192.168.1.105:32825]
  | 2007-03-06 14:19:38,991 DEBUG [org.jgroups.protocols.FD] suspected_mbrs: 
[], after adjustment: []
  | 2007-03-06 14:19:38,993 DEBUG [org.jgroups.protocols.FD_SOCK] 
determinePingDest()=192.168.1.100:32772, pingable_mbrs=[192.168.1.100:32772, 192
  | .168.1.105:32825]
  | 2007-03-06 14:19:38,994 DEBUG [org.jgroups.protocols.FD_SOCK] 
ping_dest=192.168.1.100:32772, ping_sock=Socket[addr=/192.168.1.100,port=54423,l
  | ocalport=47344], cache={192.168.1.100:32772=192.168.1.100:54423}
  | 2007-03-06 14:19:39,047 DEBUG 
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] ViewAccepted: 
initial members set
  | 2007-03-06 14:19:39,047 DEBUG 
[org.jboss.ha.framework.server.ClusterPartition] Starting channel
  | 2007-03-06 14:19:39,048 DEBUG 
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] get nodeName
  | 2007-03-06 14:19:39,048 DEBUG 
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Get current members
  | 2007-03-06 14:19:39,048 INFO  
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Number of cluster 
members: 2
  | 2007-03-06 14:19:39,049 INFO  
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Other members: 1
  | 2007-03-06 14:19:39,049 INFO  
[org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Fetching state (will 
wait for 30000 milliseconds):
  | 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER] 
GET_STATE: asking 192.168.1.100:32772 for state
  | 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER] 
passing down a SUSPEND_STABLE event
  | 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STABLE] 
suspending message garbage collection
  | 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume 
task started, max_suspend_time=33000
  | 2007-03-06 14:19:43,168 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:19:43,208 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:19:46,920 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:19:49,000 DEBUG [org.jgroups.util.TimeScheduler] Running task 
true
  | 2007-03-06 14:19:49,000 DEBUG [org.jgroups.protocols.FD] sending 
are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
  | 2007-03-06 14:19:49,001 DEBUG [org.jgroups.protocols.FD] received ack from 
192.168.1.100:32772
  | 2007-03-06 14:19:55,487 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:19:55,559 DEBUG [org.jgroups.util.TimeScheduler] Running task 
true
  | 2007-03-06 14:19:55,559 DEBUG [org.jgroups.protocols.FD] sending 
are-you-alive msg to 192.168.1.100:32770 (own address=192.168.1.105:32823)
  | 2007-03-06 14:19:55,599 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:19:55,633 DEBUG [org.jgroups.protocols.FD] received ack from 
192.168.1.100:32770
  | 2007-03-06 14:19:59,007 DEBUG [org.jgroups.util.TimeScheduler] Running task 
true
  | 2007-03-06 14:19:59,007 DEBUG [org.jgroups.protocols.FD] sending 
are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
  | 2007-03-06 14:19:59,008 DEBUG [org.jgroups.protocols.FD] received ack from 
192.168.1.100:32772
  | 2007-03-06 14:20:09,015 DEBUG [org.jgroups.util.TimeScheduler] Running task 
true
  | 2007-03-06 14:20:09,015 DEBUG [org.jgroups.protocols.FD] sending 
are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
  | 2007-03-06 14:20:09,016 DEBUG [org.jgroups.protocols.FD] received ack from 
192.168.1.100:32772
  | 2007-03-06 14:20:09,055 DEBUG 
[org.jboss.ha.framework.server.ClusterPartition] Caught exception after channel 
connected; closing channel -- In
  | itial state transfer failed: Channel.getState() returned false
  | 2007-03-06 14:20:09,056 DEBUG [org.jgroups.protocols.pbcast.STABLE] 
resuming message garbage collection
  | 2007-03-06 14:20:09,056 DEBUG [org.jgroups.protocols.pbcast.GMS] sending 
LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
  | 2007-03-06 14:20:09,113 DEBUG [org.jgroups.protocols.pbcast.GMS] 
view=[192.168.1.100:32772|2] [192.168.1.100:32772]
  | 2007-03-06 14:20:09,113 DEBUG [org.jgroups.protocols.pbcast.GMS] 
view=[192.168.1.100:32772|2] [192.168.1.100:32772]
  | 2007-03-06 14:20:13,538 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:20:13,578 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:20:14,062 DEBUG [org.jgroups.protocols.pbcast.GMS] sending 
LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
  | 2007-03-06 14:20:14,371 DEBUG [org.jgroups.util.TimeScheduler] Running task 
6-6
  | 2007-03-06 14:20:14,978 DEBUG [org.jgroups.util.TimeScheduler] Running task 
6-6
  | 2007-03-06 14:20:15,494 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:20:15,566 DEBUG [org.jgroups.util.TimeScheduler] Running task 
true
  | 2007-03-06 14:20:15,566 DEBUG [org.jgroups.protocols.FD] sending 
are-you-alive msg to 192.168.1.100:32770 (own address=192.168.1.105:32823)
  | 2007-03-06 14:20:15,602 DEBUG [org.jgroups.util.TimeScheduler] Running task 
[EMAIL PROTECTED]
  | 2007-03-06 14:20:15,635 DEBUG [org.jgroups.protocols.FD] received ack from 
192.168.1.100:32770
  | 2007-03-06 14:20:16,186 DEBUG [org.jgroups.util.TimeScheduler] Running task 
6-6
  | 2007-03-06 14:20:18,594 DEBUG [org.jgroups.util.TimeScheduler] Running task 
6-6
  | 2007-03-06 14:20:19,022 DEBUG [org.jgroups.util.TimeScheduler] Running task 
true
  | 2007-03-06 14:20:19,023 DEBUG [org.jgroups.protocols.FD] sending 
are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
  | 2007-03-06 14:20:19,024 DEBUG [org.jgroups.protocols.FD] received ack from 
192.168.1.100:32772
  | 2007-03-06 14:20:19,070 DEBUG [org.jgroups.protocols.pbcast.GMS] sending 
LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
  | 2007-03-06 14:20:19,378 DEBUG [org.jgroups.util.TimeScheduler] Running task 
7-7
  | 2007-03-06 14:20:19,986 DEBUG [org.jgroups.util.TimeScheduler] Running task 
7-7
  | 2007-03-06 14:20:21,194 DEBUG [org.jgroups.util.TimeScheduler] Running task 
7-7
  | 2007-03-06 14:20:23,402 DEBUG [org.jgroups.util.TimeScheduler] Running task 
6-6
  | 2007-03-06 14:20:23,602 DEBUG [org.jgroups.util.TimeScheduler] Running task 
7-7
  | 2007-03-06 14:20:24,074 DEBUG [org.jgroups.protocols.pbcast.GMS] 
192.168.1.105:32825 changed role to org.jgroups.protocols.pbcast.ClientGmsImp
  | l
  | 2007-03-06 14:20:24,078 DEBUG [org.jgroups.protocols.FD_SOCK] socket to 
192.168.1.100:32772 was reset
  | 2007-03-06 14:20:24,078 DEBUG [org.jgroups.protocols.FD_SOCK] pinger thread 
terminated
  | 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] closing sockets 
and stopping threads
  | 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast receive 
socket closed
  | 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast send 
socket closed
  | 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast thread 
terminated
  | 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] socket closed
  | 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] unicast receiver 
socket is closed, exception=java.net.SocketException: Socket closed
  | 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] unicast receiver 
thread terminated
  | 2007-03-06 14:20:24,082 DEBUG 
[org.jboss.ha.framework.server.ClusterPartition] Starting failed 
jboss:service=wubrothers
  | java.lang.IllegalStateException: Initial state transfer failed: 
Channel.getState() returned false
  |         at 
org.jboss.ha.framework.server.HAPartitionImpl.fetchState(HAPartitionImpl.java:351)
  | 


Node A still shows no exceptions. Here's the trace for node A that received the 
LEAVE request:

anonymous wrote : 2007-03-06 14:04:45,864 DEBUG [org.jgroups.protocols.FD] 
received ack from 192.168.1.105:32825
  | 2007-03-06 14:04:45,920 DEBUG [org.jgroups.protocols.pbcast.GMS] received 
LEAVE_REQ for 192.168.1.105:32825 from 192.168.1.105:32825
  | 2007-03-06 14:04:45,973 DEBUG [org.jgroups.protocols.pbcast.GMS] new=[], 
suspected=[], leaving=[192.168.1.105:32825], new view: [192.168.1.100
  | :32772|2] [192.168.1.100:32772]
  | 2007-03-06 14:04:45,975 DEBUG [org.jgroups.protocols.pbcast.GMS] 
view=[192.168.1.100:32772|2] [192.168.1.100:32772]
  | 2007-03-06 14:04:45,975 DEBUG [org.jgroups.protocols.pbcast.GMS] 
[local_addr=192.168.1.100:32772] view is [192.168.1.100:32772|2] [192.168.1.1
  | 00:32772]
  | 

Any other ideas?

Phil

View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4025545#4025545

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4025545
_______________________________________________
jboss-user mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/jboss-user

Reply via email to