Bugs item #782678, was opened at 2003-08-04 10:18
Message generated for change (Comment added) made by slaboure
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=782678&group_id=22866

Category: Clustering
>Group: v3.2
>Status: Closed
>Resolution: Fixed
Priority: 6
Submitted By: Sacha Labourey (slaboure)
Assigned to: Sacha Labourey (slaboure)
Summary: DistributedReplicantManager.isMasterReplica(String) false +

Initial Comment:
There is a race condition i the 
DistributedReplicantManager.isMasterReplica(String) that 
shows up when this 
method is called from within a notifyKeyListeners as 
shown by this stack trace:

Thread "main"@65 status: RUNNING
- isMasterReplica():437, 
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- isDRMMasterReplica():234, 
org.jboss.ha.jmx.HAServiceMBeanSupport
- partitionTopologyChanged():103, 
org.jboss.ha.singleton.HASingletonSupport
- replicantsChanged():197, 
org.jboss.ha.jmx.HAServiceMBeanSupport$1
- notifyKeyListeners():675, 
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- add():326, 
org.jboss.ha.framework.server.DistributedReplicantManag
erImpl
- registerDRMListener():204, 
org.jboss.ha.jmx.HAServiceMBeanSupport
- startService():144, 
org.jboss.ha.jmx.HAServiceMBeanSupport

This is due the the choice to return true when the key in 
question is in the
localReplicants table, but not the replicants table:

    public boolean isMasterReplica (String key)
    {
       if (!localReplicants.containsKey (key))
          return false;

       Vector allNodes = this.partition.getCurrentView ();
       HashMap repForKey = (HashMap)replicants.get
(key);
       if (repForKey==null)
          return true; ????

This seems to be an ambiguous condition as this 
condition exists for a node that 
calls add and when the state has not synched or has 
failed to synch. Another 
problem I'm seeing at least in the context of the 
singleton service is that the 
notion of the master node is unstable. Here is the output 
from one of 3 nodes 
running the singleton service starting with the addition 
of the final node shown 
as view 2.

15:35:44,637 INFO  [Server] JBoss (MX MicroKernel) 
[3.2.2RC3 (build: 
CVSTag=Branch_3_2 date=200307312219)] Started in 
5s:948ms
15:36:27,719 INFO  [DefaultPartition] New cluster view: 
2 ([lamia:32947, 
172.17.66.54:2821, ironmaiden:51770] delta: 1)
15:36:27,749 INFO  [DefaultPartition:ReplicantManager] 
Dead members: 0
15:37:13,555 INFO  [DefaultPartition] New cluster view 
(id: 3, delta: -1) : 
[172.17.66.54:2821, ironmaiden:51770]
15:37:13,575 INFO  [DefaultPartition:ReplicantManager] 
Dead members: 1
15:38:13,321 INFO  [HASingletonMBeanExample] Notified 
to start as singleton
15:38:13,321 INFO  [DefaultPartition] New cluster view 
(id: 4, delta: 1) : 
[172.17.66.54:2821, ironmaiden:51770, lamia:32949]
15:38:13,331 INFO  [DefaultPartition:ReplicantManager] 
Dead members: 0
15:38:13,361 INFO  [HASingletonMBeanExample] Notified 
to stop as singleton
15:39:13,447 INFO  [HASingletonMBeanExample] Notified 
to start as singleton
15:39:13,457 INFO  [HASingletonMBeanExample] Notified 
to stop as singleton

With view 3 the orginal node and singleton is killed and 
the node for which the 
console output corresponds(172.17.66.54) is selected as 
the singleton. When the 
third node is started again there is some thrashing due 
to the existing 2 nodes 
both selecting themselves as the singleton and telling 
the other to stop and it 
appears that there is no singleton choosen. The problem 
seems to be inconsistent 
  matching of member names. Once only knows it IP 
while the other node knows the 
hostnames. Here is the console view of the second node 
showing the hostnames and 
its thrashing:

15:25:21,023 INFO  [Server] JBoss (MX MicroKernel) 
[3.2.2RC3 (build: 
CVSTag=Branch_3_2 date=200307312219)] Started in 
13s:597ms
15:26:05,562 INFO  [DefaultPartition] New cluster view: 
3 ([succubus:2821, 
ironmaiden:51770] delta: -1)
15:26:05,573 INFO  [DefaultPartition:ReplicantManager] 
Dead members: 1
15:27:05,506 INFO  [HASingletonMBeanExample] Notified 
to start as singleton
15:27:05,509 INFO  [DefaultPartition] New cluster view: 
4 ([succubus:2821, 
ironmaiden:51770, lamia:32949] delta: 1)
15:27:05,513 INFO  [DefaultPartition:ReplicantManager] 
Dead members: 0
15:27:05,531 INFO  [HASingletonMBeanExample] Notified 
to stop as singleton
15:28:05,520 INFO  [HASingletonMBeanExample] Notified 
to start as singleton
15:28:05,526 INFO  [HASingletonMBeanExample] Notified 
to stop as singleton

Its not clear that the 
DistributedReplicantManager.isMasterReplica was 
designed 
to be used for the selection of a singleton node, but if it 
is, the logic needs 
to be firmed up. If not, the singleton service needs to be 
built on something else.

-- 
xxxxxxxxxxxxxxxxxxxxxxxx
Scott Stark
Chief Technology Officer
JBoss Group, LLC
xxxxxxxxxxxxxxxxxxxxxxxx

----------------------------------------------------------------------

>Comment By: Sacha Labourey (slaboure)
Date: 2003-08-17 22:40

Message:
Logged In: YES 
user_id=95900

JBossHA node naming is now di-associated from JavaGroups 
naming. Node name can be explicitely set but, by default, a 
name is created at startup by JBoss (localIP:JNDI_PORT as a 
first strategy). This should fix some of the singleton issues 
seen.

Furthermore the DRM.add method know makes synchronous 
calls over the cluster to avoid DRM.isMasterReplica consider 
itself as a master because state with other nodes is not yet 
synched. This should solved the remaining singleton issues.

As part of these changes a farming bug has been fixed which 
was causing already-deployed apps to be re-deployed on 
running nodes when starting a new node (most frequent 
when having at least 3 nodes).

Please TEST this new version and provide feedback if 
something is broken by this change.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=376685&aid=782678&group_id=22866


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
JBoss-Development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development

Reply via email to