[ha-clusters-discuss] could someone tells the difference of RGMd and RGMx? thanks.

Martin Rattner Wed, 21 Jan 2009 12:23:45 -0800

Hi Edward,

Apologies for this long message.  I will try to answer your questions.

 > What is the difference of RGMd and RGMx ?
 > I found that the start RGMd will register rgmx_hook_call.
 > yet I don't know the difference of rgmx and rgmd.

"rgmd" is the name of the RGM daemon process.  "rgmx" is a library of RGM 
extensions that were used on Europa "farm nodes".

As you might have seen on other threads, the Europa project extended the 
Solaris 
Clustering model to a "server + farm" model which made the Sun Cluster model 
scale to much higher node counts (128-256+).  A Europa cluster would consist of 
two (or more) server nodes, plus a collection of farm nodes, which could number 
in the hundreds.

Unfortunately the Europa project was not quite released as a product; however, 
all of its code remains in the Solaris Cluster and OHAC code base.  We have not 
removed the code because we have not quite given up on the possibility of 
someday releasing Europa as a product.

In Europa, each server node runs the standard Solaris Cluster software.  The 
server nodes are standard Solaris Cluster nodes.  The RGM president is always a 
server node.  Each farm node runs a lighter-weight variant of Solaris Cluster 
software, including a reduced version of rgmd.  The farm nodes are controlled 
by 
a different membership monitor which has less-stringent requirements for 
membership -- a farm node only needs to establish two-way connectivity with the 
RGM president node; we do not demand two-way connectivity between all pairs of 
farm nodes (as we do for server nodes).  This lighter-weight membership 
algorithm allowed us to scale to a large number of farm nodes.

The same application management APIs (the RGM model) are used on both server 
nodes and farm nodes.  The same Data Service agents run on server nodes and 
farm 
nodes; however, Europa did not support Oracle RAC on farm nodes, because RAC 
requires the strong membership semantics provided by base Solaris Cluster.

Now focusing on the rgm:  On a server node, the cluster API (libscha and the 
scha_* functions) are implemented by the "receptionist" code in rgmd, on every 
node.  Some requests such as scha_control() have to be handled by the rgm 
president.  In the standard (server-node) code, the slave-node rgmd makes an 
ORB 
invocation to the rgm president to execute the request.  However, on a farm 
node, there is no ORB.  Instead, the scha call goes directly to the rgm 
president via an rpc call.  In the function scha_control_action() which carries 
out the scha_control request, you will find separate code paths for Europa farm 
nodes (bracketed by "#ifdef EUROPA_FARM") and for server nodes.

Similarly, other requests that require a slave node to communicate with the rgm 
president will have an alternate Europa farm node path and a server node path. 
The code is messy because we have the Europa code interleaved with the 
non-Europa code.  For example, the function idl_process_intention(), by which 
the rgm president orders a slave node to processes a CCR update, was modified 
like this:

#ifdef EUROPA_FARM
bool_t
rgmx_process_intention_1_svc(
         rgmx_process_intention_args *rpc_args,
         void *,
         struct svc_req *)
#else
void
rgm_comm_impl::idl_process_intention(
         sol::nodeid_t president, sol::incarnation_num incarnation,
         Environment &)
#endif
{
        ....

So we have two different function headers, one used by the farm-node rgmd (via 
librgmfarm) and the other used by the server-node rgmd (via librgmserver).  The 
function body is common to both versions, except that there are a few 
additional 
"#ifdef EUROPA_FARM ... #endif" sections.

Note that the Europa enhancements were written long after the original 
server-node code and by a different team; and this is the approach they chose 
rather than the more difficult, but arguably better, approach of rewriting the 
code entirely.

The above code paths are for slave-to-president calls, so the #ifdef mechanism 
generates either farm-node code or server-node code depending on whether 
EUROPA_FARM is defined.  In our builds, EUROPA_FARM is always undefined so we 
are not building the farm node software.  When reading the code, you may 
disregard the EUROPA_FARM segments.

The president-to-slave interfaces use a different mechanism, rgmx_hook_call(). 
On a server node, ORB invocations are made directly to the slave by the 
president.  However, if the slave is a farm node, there is no ORB.  So the 
rgmx_hook_call interface is used.  For example, consider the function 
rgm_change_mastery() by which the president commands the slave to bring 
specified resource groups online or offline.  We see code like this:

idlretval_t
rgm_change_mastery(rgm::lni_t lni, namelist_t *on_list, namelist_t *off_list)
{
         ...

         if ((rgmx_hook_call(rgmx_hook_chg_mastery,
                 slave, lni, lnNode->getIncarnation(),
                 on_list, off_list,
                 &retval, &is_farmnode) == RGMX_HOOK_ENABLED) &&
                 (is_farmnode)) {
                 return (retval);
         }

        ...

If the slave is a farm node, the rgmx_hook_call() will execute an rpc to the 
farm node to carry out the change_mastery() call; and will set is_farmnode to 
true.  If the slave is a server node, the rgmx_hook_call() will be a no-op, 
except to set is_farmnode to false.  In the latter case, the code falls-thru to 
the main body of the function which uses the ORB interface to communicate with 
the slave.

Note that rgmx_hook_call() is not wrapped in #ifdef, so it actually is compiled 
into our code (though the calls essentially reduce to no-ops in the absence of 
Europa).

Besides scaling to a large number of nodes, Europa had some other interesting 
pieces of technology:

- farm hosts can keep operating even when Servers are down

- Europa code is all user-space except for the heartbeats

- Europa farm code had a Linux implementation

....................................................................

Now to answer your second question:

 > How to define the node id for a non-global zone's rgm?

A non-global zone does not have its own rgm.  A cluster of Solaris hosts 
(physical systems, LDoms, or virtual-machine domains) can support two kinds of 
clusters: the global cluster which consists of native-brand zones; and zone 
clusters consisting of a set of cluster-brand non-global zones, one zone per 
Solaris host.  For a given set of Solaris hosts there is just one global 
cluster 
and there can be multiple zone clusters.

The global cluster has its own rgmd which runs in the global zone of each 
Solaris host.  Each zone cluster has its own rgmd which also runs in the global 
zone of each Solaris host.

For a non-global zone belonging to the global cluster -- i.e., for a 
native-brand non-global zone -- the rgmd automatically assigns logical node 
identifiers (lni's) starting at 65.  For a zone belonging to a zone cluster -- 
i.e., for a cluster-brand zone -- the nodeid assigned to the zone is the same 
as 
the nodeid of the associated global zone of the global cluster.

So if there are multiple zone clusters zc1, zc2, zc3 running on the same 
physical cluster, then on nodeid 2 of the physical cluster you will have nodeid 
2 of each of the zone clusters.  Each zone cluster's "nodeid 2" is a separate 
cluster-brand zone.  Each zone cluster has its own namespace for nodeids, 
resource group names, etc.

The global cluster has one namespace for all of its zones, resource groups, etc.

I hope this answers your questions.  The latest Solaris Cluster 3.2 update 
release (Solaris Cluster 1.09) will contain full documentation of the new zone 
cluster features.  I would encourage you to read those manuals when they are 
available.  Perhaps someone on this list can provide a reference to those docs.

Regards,
--Marty

On 01/16/09 21:56, yang wrote:
> Hi all.
> What is the difference of RGMd and RGMx ? 
> I found that the start RGMd will register rgmx_hook_call.
> yet I don't know the difference of rgmx and rgmd.
>  
>  
> 
> 
> Edward

[ha-clusters-discuss] could someone tells the difference of RGMd and RGMx? thanks.

Reply via email to