Bugs item #808170, was opened at 2003-09-17 23:19
Message generated for change (Comment added) made by slaboure
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=808170&group_id=22866
Category: Clustering
Group: v3.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Adrian Brock (ejort)
Assigned to: Nobody/Anonymous (nobody)
Summary: Partition problems 3.2.2RC4
Initial Comment:
JBoss-3.2.2RC4 (current CVS)
Java1.4.2
Redhat 9
I'm seeing problems with nodes in a partition discovering
each other.
Sometimes it works,
Sometimes is doesn't (see attached console.log and
console2.log)
I have cluster logs as well if you need them.
Shutdown blocks on the partition when it fails
(see threaddumps at end of logs)
Sometimes I get a NullPointerException
03-09-17 19:07:32,545 DEBUG
[org.jboss.ha.framework.interfaces.HAPartition.Def
aultPartition] Get current members
2003-09-17 19:07:32,546 INFO
[org.jboss.ha.framework.interfaces.HAPartition.Def
aultPartition] Number of cluster members: 2
2003-09-17 19:07:32,556 INFO
[org.jboss.ha.framework.interfaces.HAPartition.Def
aultPartition] Other members: 1
2003-09-17 19:07:32,566 WARN
[org.jboss.ha.framework.interfaces.HAPartition.Def
aultPartition] No additional information has been found
in the JavaGroup
address
: make sure you are running with a correct version of
JavaGroups and
that the pr
otocol you are using supports the 'additionalData'
behaviour
2003-09-17 19:07:32,577 ERROR
[org.jboss.ha.framework.server.ClusterPartition] S
tarting failed
java.lang.NullPointerException
at
org.jboss.ha.framework.server.HAPartitionImpl.verifyNodeIsUnique(HAPa
rtitionImpl.java:763)
at
org.jboss.ha.framework.server.HAPartitionImpl.startPartition(HAPartit
ionImpl.java:236)
at
org.jboss.ha.framework.server.ClusterPartition.startService(ClusterPa
rtition.java:293)
at
org.jboss.system.ServiceMBeanSupport.start(ServiceMBeanSupport.java:1
92)
at
sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
Regards,
Adrian
----------------------------------------------------------------------
>Comment By: Sacha Labourey (slaboure)
Date: 2003-09-18 10:27
Message:
Logged In: YES
user_id=95900
That is really strange. Bela, any idea on what is happening?
The thing is that:
- I create the JChannel
- send a config event down the stack THAT CONTAINS
additional data
- connect the JChannel
- ask the JChannel for my node name and the view
and my node name does NOT contain the additional data!?!
Here is the excerpt from the log that shows that:
2003-09-17 21:59:33,277 INFO
[org.jboss.ha.framework.server.ClusterPartition] Starting
=> SL: Here, I've build my "additional data" using some
information from the currently running server, we have no
exception and nothing logged, so everything went fine in
ClusterPartition.generateUniqueNodeName()
=> SL: next cal is JChannel.connect():
2003-09-17 21:59:33,288 DEBUG
[org.jboss.ha.framework.server.ClusterPartition] Starting
ClusterPartition: DefaultPartition
2003-09-17 21:59:33,294 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:33 BST
2003] [INFO] UDP.start(): creating sockets and starting
threads
2003-09-17 21:59:33,296 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:33 BST
2003] [INFO] UDP.createSockets(): unicast sockets will use
interface 192.168.0.51
2003-09-17 21:59:33,305 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:33 BST
2003] [INFO] UDP.createSockets(): socket information:
local_addr=htimes2:34107, mcast_addr=228.1.2.3:45566,
bind_addr=/192.168.0.51, ttl=64
send socket: bound to 192.168.0.51:34106, send buffer
size=65535
receive socket: bound to 192.168.0.51:34107, receive buffer
size=65535
multicast socket: bound to 192.168.0.51:45566, send buffer
size=65535, receive buffer size=65535
2003-09-17 21:59:33,312 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:33 BST
2003] [INFO] UDP.startThreads(): created unicast receiver
thread
2003-09-17 21:59:33,431 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:33 BST
2003] [INFO] FRAG.down(): received CONFIG event:
[EMAIL PROTECTED]
2003-09-17 21:59:33,432 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:33 BST
2003] [INFO] NAKACK.down(): received CONFIG event:
[EMAIL PROTECTED]
2003-09-17 21:59:33,434 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:33 BST
2003] [INFO] UDP.down(): received CONFIG event:
[EMAIL PROTECTED]
=> SL: Here we can see that the UDP layer (above) has some
additional_data received by the CONFIG event: [EMAIL PROTECTED]
2003-09-17 21:59:33,445 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:33 BST
2003] [INFO] PING.down(): FIND_INITIAL_MBRS
2003-09-17 21:59:35,459 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [INFO] PING.down(): initial mbrs are []
2003-09-17 21:59:35,461 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [DEBUG] ClientGmsImpl.join(): initial_mbrs are []
2003-09-17 21:59:35,461 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [INFO] ClientGmsImpl.join(): no initial members
discovered: creating group as first member
2003-09-17 21:59:35,463 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [INFO] GMS.installView(): view is [htimes2:34107|0]
[htimes2:34107]
2003-09-17 21:59:35,479 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [INFO] GMS.setImpl(): changed role to
org.javagroups.protocols.pbcast.CoordGmsImpl
2003-09-17 21:59:35,479 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [INFO] GMS.becomeCoordinator(): htimes2:34107
became coordinator
2003-09-17 21:59:35,480 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [INFO] ClientGmsImpl.becomeSingletonMember():
created group (first member). My view is [htimes2:34107|0],
impl is org.javagroups.protocols.pbcast.CoordGmsImpl
2003-09-17 21:59:35,605 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [INFO] STABLE.startStableTask(): stable task started;
num_gossip_runs=3, max_gossip_runs=3
2003-09-17 21:59:35,627 TRACE
[org.javagroups.DefaultPartition] [Wed Sep 17 21:59:35 BST
2003] [INFO] MERGE2.FindSubgroups.run(): merge task started
2003-09-17 21:59:35,834 DEBUG
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition
] ViewAccepted: initial members set
2003-09-17 21:59:35,835 INFO
[org.jboss.ha.framework.server.ClusterPartition] Starting
channel
=> SL: at this point, the call to JChannel.connect() is finished
and we start the HAPartition which will call
JChannel.getLocalAddress() and the recieved JG IPAddress
object does NOT contain the additional_data: WHY?
2003-09-17 21:59:35,836 DEBUG
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition
] get nodeName
2003-09-17 21:59:35,836 DEBUG
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition
] Get current members
2003-09-17 21:59:35,837 INFO
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition
] Number of cluster members: 1
2003-09-17 21:59:35,837 INFO
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition
] Other members: 0
2003-09-17 21:59:35,837 WARN
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition
] No additional information has been found in the JavaGroup
address: make sure you are running with a correct version of
JavaGroups and that the protocol you are using supports
the 'additionalData' behaviour
Any idea Bela?
----------------------------------------------------------------------
Comment By: Adrian Brock (ejort)
Date: 2003-09-18 01:51
Message:
Logged In: YES
user_id=9459
cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
192.168.0.51 htimes2
127.0.0.1 localhost.localdomain localhost
127.0.0.1 4pcvs
There are no other instances of jboss on the network.
Sorry can't provide netstat, I'm no longer running the
configuration.
One thing occurs to me (I'm configured with a default gateway)
maybe the gateway is interfering with udp? I'll try it again
without the default gateway setting.
Regards,
Adrian
----------------------------------------------------------------------
Comment By: Scott M Stark (starksm)
Date: 2003-09-18 01:37
Message:
Logged In: YES
user_id=175228
I have an RH9 system with JDK 1.4.2 that I have tried this
recepie on a few times and don't see the problem. One issue
with the "... 'additionalData'' warning is that if this
occurs you are guarenteed to see an NPE so we should just be
throwing an exception rather than logging a msg and then
proceeding to an NPE.
Do you have another cluster running somewhere else? I'll
have to look into the logs.
----------------------------------------------------------------------
Comment By: Adrian Brock (ejort)
Date: 2003-09-17 23:23
Message:
Logged In: YES
user_id=9459
To construct my two nodes:
cp -r all all2
edit all/conf/jboss-service.xml (uncomment service-binding)
./run.sh -c all and ./run.sh -c all2
Regards,
Adrian
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=808170&group_id=22866
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
JBoss-Development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development