Bugs item #782678, was opened at 2003-08-04 10:18
Message generated for change (Comment added) made by slaboure
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=782678&group_id=22866
Category: Clustering
>Group: v3.2
>Status: Closed
>Resolution: Fixed
Priority: 6
Submitted By: Sacha Labourey (slaboure)
Assigned to: Sacha Labourey (slaboure)
Summary: DistributedReplicantManager.isMasterReplica(String) false +
Initial Comment:
There is a race condition i the
DistributedReplicantManager.isMasterReplica(String) that
shows up when this
method is called from within a notifyKeyListeners as
shown by this stack trace:
Thread "main"@65 status: RUNNING
- isMasterReplica():437,
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- isDRMMasterReplica():234,
org.jboss.ha.jmx.HAServiceMBeanSupport
- partitionTopologyChanged():103,
org.jboss.ha.singleton.HASingletonSupport
- replicantsChanged():197,
org.jboss.ha.jmx.HAServiceMBeanSupport$1
- notifyKeyListeners():675,
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- add():326,
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- registerDRMListener():204,
org.jboss.ha.jmx.HAServiceMBeanSupport
- startService():144,
org.jboss.ha.jmx.HAServiceMBeanSupport
This is due the the choice to return true when the key in
question is in the
localReplicants table, but not the replicants table:
public boolean isMasterReplica (String key)
{
if (!localReplicants.containsKey (key))
return false;
Vector allNodes = this.partition.getCurrentView ();
HashMap repForKey = (HashMap)replicants.get
(key);
if (repForKey==null)
return true; ????
This seems to be an ambiguous condition as this
condition exists for a node that
calls add and when the state has not synched or has
failed to synch. Another
problem I'm seeing at least in the context of the
singleton service is that the
notion of the master node is unstable. Here is the output
from one of 3 nodes
running the singleton service starting with the addition
of the final node shown
as view 2.
15:35:44,637 INFO [Server] JBoss (MX MicroKernel)
[3.2.2RC3 (build:
CVSTag=Branch_3_2 date=200307312219)] Started in
5s:948ms
15:36:27,719 INFO [DefaultPartition] New cluster view:
2 ([lamia:32947,
172.17.66.54:2821, ironmaiden:51770] delta: 1)
15:36:27,749 INFO [DefaultPartition:ReplicantManager]
Dead members: 0
15:37:13,555 INFO [DefaultPartition] New cluster view
(id: 3, delta: -1) :
[172.17.66.54:2821, ironmaiden:51770]
15:37:13,575 INFO [DefaultPartition:ReplicantManager]
Dead members: 1
15:38:13,321 INFO [HASingletonMBeanExample] Notified
to start as singleton
15:38:13,321 INFO [DefaultPartition] New cluster view
(id: 4, delta: 1) :
[172.17.66.54:2821, ironmaiden:51770, lamia:32949]
15:38:13,331 INFO [DefaultPartition:ReplicantManager]
Dead members: 0
15:38:13,361 INFO [HASingletonMBeanExample] Notified
to stop as singleton
15:39:13,447 INFO [HASingletonMBeanExample] Notified
to start as singleton
15:39:13,457 INFO [HASingletonMBeanExample] Notified
to stop as singleton
With view 3 the orginal node and singleton is killed and
the node for which the
console output corresponds(172.17.66.54) is selected as
the singleton. When the
third node is started again there is some thrashing due
to the existing 2 nodes
both selecting themselves as the singleton and telling
the other to stop and it
appears that there is no singleton choosen. The problem
seems to be inconsistent
matching of member names. Once only knows it IP
while the other node knows the
hostnames. Here is the console view of the second node
showing the hostnames and
its thrashing:
15:25:21,023 INFO [Server] JBoss (MX MicroKernel)
[3.2.2RC3 (build:
CVSTag=Branch_3_2 date=200307312219)] Started in
13s:597ms
15:26:05,562 INFO [DefaultPartition] New cluster view:
3 ([succubus:2821,
ironmaiden:51770] delta: -1)
15:26:05,573 INFO [DefaultPartition:ReplicantManager]
Dead members: 1
15:27:05,506 INFO [HASingletonMBeanExample] Notified
to start as singleton
15:27:05,509 INFO [DefaultPartition] New cluster view:
4 ([succubus:2821,
ironmaiden:51770, lamia:32949] delta: 1)
15:27:05,513 INFO [DefaultPartition:ReplicantManager]
Dead members: 0
15:27:05,531 INFO [HASingletonMBeanExample] Notified
to stop as singleton
15:28:05,520 INFO [HASingletonMBeanExample] Notified
to start as singleton
15:28:05,526 INFO [HASingletonMBeanExample] Notified
to stop as singleton
Its not clear that the
DistributedReplicantManager.isMasterReplica was
designed
to be used for the selection of a singleton node, but if it
is, the logic needs
to be firmed up. If not, the singleton service needs to be
built on something else.
--
xxxxxxxxxxxxxxxxxxxxxxxx
Scott Stark
Chief Technology Officer
JBoss Group, LLC
xxxxxxxxxxxxxxxxxxxxxxxx
----------------------------------------------------------------------
>Comment By: Sacha Labourey (slaboure)
Date: 2003-08-17 22:40
Message:
Logged In: YES
user_id=95900
JBossHA node naming is now di-associated from JavaGroups
naming. Node name can be explicitely set but, by default, a
name is created at startup by JBoss (localIP:JNDI_PORT as a
first strategy). This should fix some of the singleton issues
seen.
Furthermore the DRM.add method know makes synchronous
calls over the cluster to avoid DRM.isMasterReplica consider
itself as a master because state with other nodes is not yet
synched. This should solved the remaining singleton issues.
As part of these changes a farming bug has been fixed which
was causing already-deployed apps to be re-deployed on
running nodes when starting a new node (most frequent
when having at least 3 nodes).
Please TEST this new version and provide feedback if
something is broken by this change.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=782678&group_id=22866
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
JBoss-Development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development