Hi Edward, Apologies for this long message. I will try to answer your questions.
> What is the difference of RGMd and RGMx ? > I found that the start RGMd will register rgmx_hook_call. > yet I don't know the difference of rgmx and rgmd. "rgmd" is the name of the RGM daemon process. "rgmx" is a library of RGM extensions that were used on Europa "farm nodes". As you might have seen on other threads, the Europa project extended the Solaris Clustering model to a "server + farm" model which made the Sun Cluster model scale to much higher node counts (128-256+). A Europa cluster would consist of two (or more) server nodes, plus a collection of farm nodes, which could number in the hundreds. Unfortunately the Europa project was not quite released as a product; however, all of its code remains in the Solaris Cluster and OHAC code base. We have not removed the code because we have not quite given up on the possibility of someday releasing Europa as a product. In Europa, each server node runs the standard Solaris Cluster software. The server nodes are standard Solaris Cluster nodes. The RGM president is always a server node. Each farm node runs a lighter-weight variant of Solaris Cluster software, including a reduced version of rgmd. The farm nodes are controlled by a different membership monitor which has less-stringent requirements for membership -- a farm node only needs to establish two-way connectivity with the RGM president node; we do not demand two-way connectivity between all pairs of farm nodes (as we do for server nodes). This lighter-weight membership algorithm allowed us to scale to a large number of farm nodes. The same application management APIs (the RGM model) are used on both server nodes and farm nodes. The same Data Service agents run on server nodes and farm nodes; however, Europa did not support Oracle RAC on farm nodes, because RAC requires the strong membership semantics provided by base Solaris Cluster. Now focusing on the rgm: On a server node, the cluster API (libscha and the scha_* functions) are implemented by the "receptionist" code in rgmd, on every node. Some requests such as scha_control() have to be handled by the rgm president. In the standard (server-node) code, the slave-node rgmd makes an ORB invocation to the rgm president to execute the request. However, on a farm node, there is no ORB. Instead, the scha call goes directly to the rgm president via an rpc call. In the function scha_control_action() which carries out the scha_control request, you will find separate code paths for Europa farm nodes (bracketed by "#ifdef EUROPA_FARM") and for server nodes. Similarly, other requests that require a slave node to communicate with the rgm president will have an alternate Europa farm node path and a server node path. The code is messy because we have the Europa code interleaved with the non-Europa code. For example, the function idl_process_intention(), by which the rgm president orders a slave node to processes a CCR update, was modified like this: #ifdef EUROPA_FARM bool_t rgmx_process_intention_1_svc( rgmx_process_intention_args *rpc_args, void *, struct svc_req *) #else void rgm_comm_impl::idl_process_intention( sol::nodeid_t president, sol::incarnation_num incarnation, Environment &) #endif { .... So we have two different function headers, one used by the farm-node rgmd (via librgmfarm) and the other used by the server-node rgmd (via librgmserver). The function body is common to both versions, except that there are a few additional "#ifdef EUROPA_FARM ... #endif" sections. Note that the Europa enhancements were written long after the original server-node code and by a different team; and this is the approach they chose rather than the more difficult, but arguably better, approach of rewriting the code entirely. The above code paths are for slave-to-president calls, so the #ifdef mechanism generates either farm-node code or server-node code depending on whether EUROPA_FARM is defined. In our builds, EUROPA_FARM is always undefined so we are not building the farm node software. When reading the code, you may disregard the EUROPA_FARM segments. The president-to-slave interfaces use a different mechanism, rgmx_hook_call(). On a server node, ORB invocations are made directly to the slave by the president. However, if the slave is a farm node, there is no ORB. So the rgmx_hook_call interface is used. For example, consider the function rgm_change_mastery() by which the president commands the slave to bring specified resource groups online or offline. We see code like this: idlretval_t rgm_change_mastery(rgm::lni_t lni, namelist_t *on_list, namelist_t *off_list) { ... if ((rgmx_hook_call(rgmx_hook_chg_mastery, slave, lni, lnNode->getIncarnation(), on_list, off_list, &retval, &is_farmnode) == RGMX_HOOK_ENABLED) && (is_farmnode)) { return (retval); } ... If the slave is a farm node, the rgmx_hook_call() will execute an rpc to the farm node to carry out the change_mastery() call; and will set is_farmnode to true. If the slave is a server node, the rgmx_hook_call() will be a no-op, except to set is_farmnode to false. In the latter case, the code falls-thru to the main body of the function which uses the ORB interface to communicate with the slave. Note that rgmx_hook_call() is not wrapped in #ifdef, so it actually is compiled into our code (though the calls essentially reduce to no-ops in the absence of Europa). Besides scaling to a large number of nodes, Europa had some other interesting pieces of technology: - farm hosts can keep operating even when Servers are down - Europa code is all user-space except for the heartbeats - Europa farm code had a Linux implementation .................................................................... Now to answer your second question: > How to define the node id for a non-global zone's rgm? A non-global zone does not have its own rgm. A cluster of Solaris hosts (physical systems, LDoms, or virtual-machine domains) can support two kinds of clusters: the global cluster which consists of native-brand zones; and zone clusters consisting of a set of cluster-brand non-global zones, one zone per Solaris host. For a given set of Solaris hosts there is just one global cluster and there can be multiple zone clusters. The global cluster has its own rgmd which runs in the global zone of each Solaris host. Each zone cluster has its own rgmd which also runs in the global zone of each Solaris host. For a non-global zone belonging to the global cluster -- i.e., for a native-brand non-global zone -- the rgmd automatically assigns logical node identifiers (lni's) starting at 65. For a zone belonging to a zone cluster -- i.e., for a cluster-brand zone -- the nodeid assigned to the zone is the same as the nodeid of the associated global zone of the global cluster. So if there are multiple zone clusters zc1, zc2, zc3 running on the same physical cluster, then on nodeid 2 of the physical cluster you will have nodeid 2 of each of the zone clusters. Each zone cluster's "nodeid 2" is a separate cluster-brand zone. Each zone cluster has its own namespace for nodeids, resource group names, etc. The global cluster has one namespace for all of its zones, resource groups, etc. I hope this answers your questions. The latest Solaris Cluster 3.2 update release (Solaris Cluster 1.09) will contain full documentation of the new zone cluster features. I would encourage you to read those manuals when they are available. Perhaps someone on this list can provide a reference to those docs. Regards, --Marty On 01/16/09 21:56, yang wrote: > Hi all. > What is the difference of RGMd and RGMx ? > I found that the start RGMd will register rgmx_hook_call. > yet I don't know the difference of rgmx and rgmd. > > > > > Edward