Bugs item #782678, was opened at 2003-08-04 10:18
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=782678&group_id=22866
Category: Clustering
Group: None
Status: Open
Resolution: None
Priority: 6
Submitted By: Sacha Labourey (slaboure)
Assigned to: Sacha Labourey (slaboure)
Summary: DistributedReplicantManager.isMasterReplica(String) false +
Initial Comment:
There is a race condition i the
DistributedReplicantManager.isMasterReplica(String) that
shows up when this
method is called from within a notifyKeyListeners as
shown by this stack trace:
Thread "main"@65 status: RUNNING
- isMasterReplica():437,
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- isDRMMasterReplica():234,
org.jboss.ha.jmx.HAServiceMBeanSupport
- partitionTopologyChanged():103,
org.jboss.ha.singleton.HASingletonSupport
- replicantsChanged():197,
org.jboss.ha.jmx.HAServiceMBeanSupport$1
- notifyKeyListeners():675,
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- add():326,
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- registerDRMListener():204,
org.jboss.ha.jmx.HAServiceMBeanSupport
- startService():144,
org.jboss.ha.jmx.HAServiceMBeanSupport
This is due the the choice to return true when the key in
question is in the
localReplicants table, but not the replicants table:
public boolean isMasterReplica (String key)
{
if (!localReplicants.containsKey (key))
return false;
Vector allNodes = this.partition.getCurrentView ();
HashMap repForKey = (HashMap)replicants.get
(key);
if (repForKey==null)
return true; ????
This seems to be an ambiguous condition as this
condition exists for a node that
calls add and when the state has not synched or has
failed to synch. Another
problem I'm seeing at least in the context of the
singleton service is that the
notion of the master node is unstable. Here is the output
from one of 3 nodes
running the singleton service starting with the addition
of the final node shown
as view 2.
15:35:44,637 INFO [Server] JBoss (MX MicroKernel)
[3.2.2RC3 (build:
CVSTag=Branch_3_2 date=200307312219)] Started in
5s:948ms
15:36:27,719 INFO [DefaultPartition] New cluster view:
2 ([lamia:32947,
172.17.66.54:2821, ironmaiden:51770] delta: 1)
15:36:27,749 INFO [DefaultPartition:ReplicantManager]
Dead members: 0
15:37:13,555 INFO [DefaultPartition] New cluster view
(id: 3, delta: -1) :
[172.17.66.54:2821, ironmaiden:51770]
15:37:13,575 INFO [DefaultPartition:ReplicantManager]
Dead members: 1
15:38:13,321 INFO [HASingletonMBeanExample] Notified
to start as singleton
15:38:13,321 INFO [DefaultPartition] New cluster view
(id: 4, delta: 1) :
[172.17.66.54:2821, ironmaiden:51770, lamia:32949]
15:38:13,331 INFO [DefaultPartition:ReplicantManager]
Dead members: 0
15:38:13,361 INFO [HASingletonMBeanExample] Notified
to stop as singleton
15:39:13,447 INFO [HASingletonMBeanExample] Notified
to start as singleton
15:39:13,457 INFO [HASingletonMBeanExample] Notified
to stop as singleton
With view 3 the orginal node and singleton is killed and
the node for which the
console output corresponds(172.17.66.54) is selected as
the singleton. When the
third node is started again there is some thrashing due
to the existing 2 nodes
both selecting themselves as the singleton and telling
the other to stop and it
appears that there is no singleton choosen. The problem
seems to be inconsistent
matching of member names. Once only knows it IP
while the other node knows the
hostnames. Here is the console view of the second node
showing the hostnames and
its thrashing:
15:25:21,023 INFO [Server] JBoss (MX MicroKernel)
[3.2.2RC3 (build:
CVSTag=Branch_3_2 date=200307312219)] Started in
13s:597ms
15:26:05,562 INFO [DefaultPartition] New cluster view:
3 ([succubus:2821,
ironmaiden:51770] delta: -1)
15:26:05,573 INFO [DefaultPartition:ReplicantManager]
Dead members: 1
15:27:05,506 INFO [HASingletonMBeanExample] Notified
to start as singleton
15:27:05,509 INFO [DefaultPartition] New cluster view:
4 ([succubus:2821,
ironmaiden:51770, lamia:32949] delta: 1)
15:27:05,513 INFO [DefaultPartition:ReplicantManager]
Dead members: 0
15:27:05,531 INFO [HASingletonMBeanExample] Notified
to stop as singleton
15:28:05,520 INFO [HASingletonMBeanExample] Notified
to start as singleton
15:28:05,526 INFO [HASingletonMBeanExample] Notified
to stop as singleton
Its not clear that the
DistributedReplicantManager.isMasterReplica was
designed
to be used for the selection of a singleton node, but if it
is, the logic needs
to be firmed up. If not, the singleton service needs to be
built on something else.
--
xxxxxxxxxxxxxxxxxxxxxxxx
Scott Stark
Chief Technology Officer
JBoss Group, LLC
xxxxxxxxxxxxxxxxxxxxxxxx
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=782678&group_id=22866
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
JBoss-Development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development