Thiru,
As promised, you can find my new IPSQ IPMP code -- along with comments
that explain it -- at http://cr.grommit.com/~meem/ipmp-cv-dev/
In particular, check out:
* The block comment in ip.h (part of which is provided below).
* The comments and code for ipsq_dq(), ipsq_exit(), and the
ip_sioctl_groupname()/ip_join_illgrps interaction in ip_if.c.
* The code that adjusts ipsq_swxop in ipmp.c.
Keep in mind the codebase in general is a work in progress -- but I
believe the IPSQ design and code is pretty solid.
Note that the above webrev has an "IPMP-less" Nevada parent, so the "left"
side of the diffs show the simple IPMP-less IPSQ design where each phyint
is associated with one IPSQ and always operate completely independently
from one another. The big change from the earlier IPMP IPSQ design is
that IPSQs are never merged or split. Instead, all of the fields in the
IPSQ structure that are tied to the current exclusive operation ("xop")
have been moved to a new ipxop_t which the IPSQ points at. Every IPSQ
points to exactly one xop, but all IPSQs associated with phyints in the
same IPMP group us the same xop. Specifically, from the comments in ip.h:
/*
* The IP-MT design revolves around the serialization objects ipsq_t (IPSQ)
* and ipxop_t (exclusive operation or "xop"). Becoming "writer" on an IPSQ
* ensures that no other threads can become "writer" on any IPSQs sharing that
* IPSQ's xop until the writer thread is done.
*
* Each phyint points to one IPSQ that remains fixed over the phyint's life.
* Each IPSQ points to one xop that can change over the IPSQ's life. If a
* phyint is *not* in an IPMP group, then its IPSQ will refer to the IPSQ's
* "own" xop (ipsq_ownxop). If a phyint *is* part of an IPMP group, then its
* IPSQ will refer to the "group" xop, which is shorthand for the xop of the
* IPSQ of the IPMP meta-interface's phyint. Thus, all phyints that are part
* of the same IPMP group will have their IPSQ's point to the group xop, and
* thus becoming "writer" on any phyint in the group will prevent any other
* writer on any other phyint in the group. All IPSQs sharing the same xop
* are chainged together through ipsq_next (in the degenerate common case,
* ipsq_next simply refers to itself). Note that the group xop is guaranteed
* to exist at least as long as there are members in the group, since the IPMP
* meta-interface can only be destroyed if the group is empty.
*
* Incoming exclusive operation requests are enqueued on the IPSQ they arrived
* on rather than the xop. This makes switching xop's (as would happen when a
* phyint leaves an IPMP group) simple, because after the phyint leaves the
* group, any operations enqueued on its IPSQ can be safely processed with
* respect to its new xop, and any operations enqueued on the IPSQs of its
* former group can be processed with respect to their existing group xop.
* Even so, switching xops is a subtle dance; see ipsq_dq() for details.
*
* An IPSQ's "own" xop is embedded within the IPSQ itself since they have have
* identical lifetimes, and because doing so simplifies pointer management.
* While each phyint and IPSQ point to each other, it is not possible to free
* the IPSQ when the phyint is freed, since we may still *inside* the IPSQ
* when the phyint is being freed. Thus, ipsq_phyint is set to NULL when the
* phyint is freed, and the IPSQ free is later done in ipsq_exit().
*/
--
meem