There have been several discussions on SM issues with IPv6 solicited node multicast (SNM) scalability. There was a thread entitled "IPv6 and IPoIB scalability issue" (http://lists.openfabrics.org/pipermail/general/2006-November/029621.html) and a couple of subsequent threads on a workaround.
It is proposed here to remove the workaround and replace it with a complete solution for this issue. PROBLEM STATEMENT The primary issue is that IPv6 SNM traded off using separate multicast groups (rather than broadcast) for performing neighbor discovery (ND). SM's that utilize a simple scheme of a 1:1 mapping of multicast group (MGID) to multicast LID (MLID) consume too many MLIDs in large clusters. CURRENT DESIGN There is a current workaround in OpenSM for this is an option called consolidate_ipv6_snm_req. This workaround attempts to compress the IPv6 SNM groups to 1 MLID. Limitations of this workaround have been discussed on this list previously. Underlying this, the current OpenSM design assumes a 1:1 mapping of multicast group to MLID. It currently utilizes a "quick" map, which is a red/black tree, supporting up through 64 bit keys. Unfortunately, multicast group (MGID) is a 128 bit key. PROPOSED APPROACH The approach is in 2 steps: 1. Change the current underlying multicast tree from being MLID based to MGID based. This involves using fleximap rather than qmap. The downside of this is that MLID lookups will be slower as now they are not as "direct" as the MLID will no longer be the key in the map. Rather than searching by MLID key, the tree will need to be scanned entry by entry for MLID matches. It's unclear how much this will slow down MLID searches but it is thought that none of these searches are time critical (and shouldn't cause any existing timeouts to "pop"). 2. Add in support for overloading MLIDs. On the configuration side, a number of additional options would be added to consolidate_ipv6_snm_req. These include the number of MLIDs to compress down to (default 16), a multicast group (MGID) base address and (full MGID) mask. this would default to 0xff1Z601bXXXX0000 : 0x00000001ffYYYYYY where Z is the scope, XXXX is the P_Key, and YYYYYY is the last 24 bits of the port guid ( the YYYYYY bits would be masked out by default). This is what the current workaround uses for collapsing the multicast groups. The criteria for overloading MLIDs includes any group parameters that need to be in common (e.g. rate. MTU, perhaps PKey (see below), etc.). Aside from changing the underlying implementation of MLID searches, multicast group deletion wll need another check when there are no ports left in a group. If that group is on a compressed MLID (this part of the check is an optimization), then the multicast group tree needs to be checked to ensure there are no other groups sharing that MLID. IBA 1.2.1 v1 p.151 4.1.3 Local Identifiers item 10) states: "When a multicast LID is overloaded, the multicast groups sharing the same MLID must have the same P_Key. This simplification is required to allow switches and routers that implement optional P_Key enforcement for multicast operations." This is part of the C4-5 compliance. OPEN ISSUE As PKey is part of the MGID, does this need to be addressed (and if so) how ? More on the above as I get further. If the approach above seems reasonable, I will work on such a set of patches. Comments ? Thoughts ? -- Hal _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
