On Tue, 2008-01-15 at 00:50 +0000, Sasha Khapyorsky wrote:
> On 16:05 Mon 14 Jan     , Hal Rosenstock wrote:
> > On Mon, 2008-01-14 at 15:35 -0800, Ira Weiny wrote:
> > > On Mon, 14 Jan 2008 12:23:34 -0800
> > > Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Mon, 2008-01-14 at 10:51 -0800, Ira Weiny wrote:
> > > > > Hey Hal, thanks for the response.  Comments below.
> > > > > 
> > > > > On Mon, 14 Jan 2008 12:57:45 -0500
> > > > > "Hal Rosenstock" <[EMAIL PROTECTED]> wrote:
> > > > > 
> > > > > > Hi Ira,
> > > > > > 
> > > > > > On 1/12/08, Ira Weiny <[EMAIL PROTECTED]> wrote:
> > > > > > > And to further answer my question...[*]
> > > > > > >
> > > > > > > This seems to fix the problem for us, however I know that it 
> > > > > > > could be better.
> > > > > > > For example it only takes care of partition 0xFFFF, and I think 
> > > > > > > Jason's idea of
> > > > > > > having say 16 Mcast Groups and some hash of these into them would 
> > > > > > > be nice.  But
> > > > > > > is this on the right track?  Am I missing some other place in the 
> > > > > > > code?
> > > > > > 
> > > > > > This is a start.
> > > > > > 
> > > > > > Some initial comments on a quick scan of the approach used:
> > > > > > 
> > > > > > This assumes a homogeneous subnet (in terms of rates and MTUs). I
> > > > > > think that only groups which share the same rate and MTU can share 
> > > > > > the
> > > > > > same MLID.
> > > > > 
> > > > > Ah indeed this might be an issue.  This might not be the best place 
> > > > > for the
> > > > > code.  :-(
> > > > > 
> > > > > > 
> > > > > > Also, MLIDs will now need to be use counted and only removed when 
> > > > > > all
> > > > > > the groups sharing that MLID are removed.
> > > > > 
> > > > > I don't quite understand what you mean here.  There is still a 1:1 
> > > > > mapping of
> > > > > MLID's to MGID's. 
> > > > 
> > > > Didn't you just change that in that many MGIDs go to one MLID ?
> > > 
> > > Ah, this is where the confusion has been.  No, this is _not_ what I 
> > > did...  I
> > > see now; that is what was proposed in the thread a year ago, however, I 
> > > don't
> > > think mapping many MGIDs to 1 MLID will work well.
> > 
> > Why not ?
> > 
> > It appears to be what you did (multiple MGIDs are mapped onto MLID (in
> > the case below 0xc002)). Am I mistaken ?
> 
> As far as I understand this patch it is the different. Here multiple
> ports which match ipv6 solicited node multicast address will try to
> join a single MC group (with single MGID and unique MLID).

I don't think you are using the IBA defined terminology.

A MC group is an MGID in terms of the IBA spec. Also, the SA GetTable
with MGIDs wildcarded shows all the MGIDs. (Does it show that "special"
MGID ?)

I would phrase this differently:
All IPv6 SNM groups are mapped to a single MLID (when this feature is
enabled). It so happens that OpenSM internally does the accounting on
membership by treating them all as members of the same "base" or
"masked" group by masking off partition and the low 24 bits (port GUID).

-- Hal

> Sasha
> 
> > 
> > > What I did was to allow the first IPv6 request to create the group and 
> > > then all
> > > other requests were added to this group.
> > 
> > You are using the word group loosely here and that is the source of the
> > confusion IMO. I think by group you mean MLID.
> > 
> > >   This sends all the neighbor discovery messages to all nodes on the 
> > > network.
> > 
> > All nodes part of that MLID tree.
> > 
> > >   This might seem inefficient but should work.  (... and seems to.)
> > 
> > Sure; the hosts will filter based on MGID. The tradeoff is MLID
> > utilization versus fabric utilization.
> > 
> > > > >  All of the requests for this type of MGRP join are routed to
> > > > > one group.  Therefore, I thought the same rules for deleting the 
> > > > > group would
> > > > > apply; when all the members are gone it is removed?
> > > > 
> > > > Yes, the group may go but not the underlying MLID as there are other
> > > > groups which are sharing this. That's not what happens now.
> > > 
> > > No, since there is only 1 group in this implementation it should work like
> > > others.  The first node of this "mgid type" will create the group.  
> > > Others will
> > > join it and will continue to use it even if the creator leaves.
> > 
> > Are you saying all these groups appear as 1 "group" to OpenSM (as the
> > real groups are masked to the same value) ?
> > 
> > -- Hal
> > 
> > > Does this make more sense?
> > > 
> > > Ira
> > > 
> > > > 
> > > > >   Just to be clear, after
> > > > > this patch the mgroups are:
> > > > > 
> > > > > 09:36:40 > saquery -g
> > > > > MCMemberRecord group dump:
> > > > >                 MGID....................0xff12401bffff0000 : 
> > > > > 0x00000000ffffffff
> > > > >                 Mlid....................0xC000
> > > > >                 Mtu.....................0x84
> > > > >                 pkey....................0xFFFF
> > > > >                 Rate....................0x83
> > > > > MCMemberRecord group dump:
> > > > >                 MGID....................0xff12401bffff0000 : 
> > > > > 0x0000000000000001
> > > > >                 Mlid....................0xC001
> > > > >                 Mtu.....................0x84
> > > > >                 pkey....................0xFFFF
> > > > >                 Rate....................0x83
> > > > > MCMemberRecord group dump:
> > > > >                 MGID....................0xff12601bffff0000 : 
> > > > > 0x00000001ff0021e9
> > > > >                 Mlid....................0xC002
> > > > >                 Mtu.....................0x84
> > > > >                 pkey....................0xFFFF
> > > > >                 Rate....................0x83
> > > > > MCMemberRecord group dump:
> > > > >                 MGID....................0xff12601bffff0000 : 
> > > > > 0x0000000000000001
> > > > >                 Mlid....................0xC003
> > > > >                 Mtu.....................0x84
> > > > >                 pkey....................0xFFFF
> > > > >                 Rate....................0x83
> > > > > 
> > > > > All of these requests are added to the
> > > > >    MGID....................0xff12601bffff0000 : 0x00000001ff0021e9
> > > > >    Mlid....................0xC002
> > > > > group.  But as you say, how do we determine that the pkey, mtu, and 
> > > > > rate are
> > > > > valid?  :-/
> > > > > 
> > > > > But here is a question:
> > > > > 
> > > > > What happens if someone with an incorrect MTU tries to join the
> > > > >    MGID....................0xff12401bffff0000 : 0x0000000000000001
> > > > > group?  Wouldn't this code return this mgrp pointer and the 
> > > > > subsequent MTU and
> > > > > rate checks fail?  I seem to recall a thread discussing this before.  
> > > > > I don't
> > > > > remember what the outcome was.  I seem to remember the question was 
> > > > > if OpenSM
> > > > > should create/modify a group to the "lowest common" MTU/Rate, and 
> > > > > succeed all
> > > > > the joins, vs enforcing the faster MTU/Rate and failing the joins.
> > > > 
> > > > Yes, the join would fail, but I don't think that's what we would want.
> > > > The alternative with the patch is to make it the lowest rate but there
> > > > is a minimum MTU which might not be right.
> > > > 
> > > > > > I think this is a policy and rather than this always being the case,
> > > > > > there should be a policy parameter added to OpenSM for this. IMO
> > > > > > default should be to not do this.
> > > > > 
> > > > > Yes, for sure there needs to be some options to control the behavior.
> > > > > 
> > > > > > 
> > > > > > Maybe more later...
> > > > > 
> > > > > Thanks again,
> > > > > Ira
> > > > > 
> > > > > > 
> > > > > > -- Hal
> > > > > > 
> > > > > > > Thanks,
> > > > > > > Ira
> > > > > > >
> > > > > > > [*] Again I apologize for the spam but we were in a bit of a 
> > > > > > > panic as we only
> > > > > > > have the big system for the weekend and IB was not part of the 
> > > > > > > test...  ;-)
> > > > > > >
> > > > > > > >From 35e35a9534bd49147886ac93ab1601acadcdbe26 Mon Sep 17 
> > > > > > > >00:00:00 2001
> > > > > > > From: Ira K. Weiny <[EMAIL PROTECTED]>
> > > > > > > Date: Fri, 11 Jan 2008 22:58:19 -0800
> > > > > > > Subject: [PATCH] Special Case the IPv6 Solicited Node Multicast 
> > > > > > > address to use a single Mcast
> > > > > > > Group.
> > > > > > >
> > > > > > > Signed-off-by: root <[EMAIL PROTECTED]>
> > > > > > > ---
> > > > > > >  opensm/opensm/osm_sa_mcmember_record.c |   30 
> > > > > > > +++++++++++++++++++++++++++++-
> > > > > > >  opensm/opensm/osm_sa_path_record.c     |   31 
> > > > > > > ++++++++++++++++++++++++++++++-
> > > > > > >  2 files changed, 59 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/opensm/opensm/osm_sa_mcmember_record.c 
> > > > > > > b/opensm/opensm/osm_sa_mcmember_record.c
> > > > > > > index 8eb97ad..6bcc124 100644
> > > > > > > --- a/opensm/opensm/osm_sa_mcmember_record.c
> > > > > > > +++ b/opensm/opensm/osm_sa_mcmember_record.c
> > > > > > > @@ -124,9 +124,37 @@ __search_mgrp_by_mgid(IN cl_map_item_t * 
> > > > > > > const p_map_item, IN void *context)
> > > > > > >        /* compare entire MGID so different scope will not sneak 
> > > > > > > in for
> > > > > > >           the same MGID */
> > > > > > >        if (memcmp(&p_mgrp->mcmember_rec.mgid,
> > > > > > > -                  &p_recvd_mcmember_rec->mgid, sizeof(ib_gid_t)))
> > > > > > > +                  &p_recvd_mcmember_rec->mgid, 
> > > > > > > sizeof(ib_gid_t))) {
> > > > > > > +
> > > > > > > +               /* Special Case IPV6 Multicast Loopback addresses 
> > > > > > > */
> > > > > > > +               /* 0xff12601bffff0000 : 0x00000001ffXXXXXX */
> > > > > > > +#define SPEC_PREFIX (0xff12601bffff0000)
> > > > > > > +#define INT_ID_MASK (0x00000001ff000000)
> > > > > > > +               uint64_t g_prefix = 
> > > > > > > cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.prefix);
> > > > > > > +               uint64_t g_interface_id = 
> > > > > > > cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.interface_id);
> > > > > > > +               uint64_t rcv_prefix = 
> > > > > > > cl_ntoh64(p_recvd_mcmember_rec->mgid.unicast.prefix);
> > > > > > > +               uint64_t rcv_interface_id = 
> > > > > > > cl_ntoh64(p_recvd_mcmember_rec->mgid.unicast.interface_id);
> > > > > > > +
> > > > > > > +               if (rcv_prefix == SPEC_PREFIX
> > > > > > > +                       &&
> > > > > > > +                       (rcv_interface_id & INT_ID_MASK) == 
> > > > > > > INT_ID_MASK) {
> > > > > > > +
> > > > > > > +                       if ((g_prefix == rcv_prefix)
> > > > > > > +                               &&
> > > > > > > +                               (g_interface_id & INT_ID_MASK) ==
> > > > > > > +                                       (rcv_interface_id & 
> > > > > > > INT_ID_MASK)
> > > > > > > +                               ) {
> > > > > > > +                               osm_log(sa->p_log, OSM_LOG_INFO,
> > > > > > > +                                       "Special Case Mcast Join 
> > > > > > > for MGID "
> > > > > > > +                                       " MGID 0x%016"PRIx64" : 
> > > > > > > 0x%016"PRIx64"\n",
> > > > > > > +                                       rcv_prefix, 
> > > > > > > rcv_interface_id);
> > > > > > > +                               goto match;
> > > > > > > +                       }
> > > > > > > +               }
> > > > > > >                return;
> > > > > > > +       }
> > > > > > >
> > > > > > > +match:
> > > > > > >        if (p_ctxt->p_mgrp) {
> > > > > > >                osm_log(sa->p_log, OSM_LOG_ERROR,
> > > > > > >                        "__search_mgrp_by_mgid: ERR 1B03: "
> > > > > > > diff --git a/opensm/opensm/osm_sa_path_record.c 
> > > > > > > b/opensm/opensm/osm_sa_path_record.c
> > > > > > > index 749a936..469773a 100644
> > > > > > > --- a/opensm/opensm/osm_sa_path_record.c
> > > > > > > +++ b/opensm/opensm/osm_sa_path_record.c
> > > > > > > @@ -1536,8 +1536,37 @@ __search_mgrp_by_mgid(IN cl_map_item_t * 
> > > > > > > const p_map_item, IN void *context)
> > > > > > >
> > > > > > >        /* compare entire MGID so different scope will not sneak 
> > > > > > > in for
> > > > > > >           the same MGID */
> > > > > > > -       if (memcmp(&p_mgrp->mcmember_rec.mgid, p_recvd_mgid, 
> > > > > > > sizeof(ib_gid_t)))
> > > > > > > +       if (memcmp(&p_mgrp->mcmember_rec.mgid, p_recvd_mgid, 
> > > > > > > sizeof(ib_gid_t))) {
> > > > > > > +
> > > > > > > +               /* Special Case IPV6 Multicast Loopback addresses 
> > > > > > > */
> > > > > > > +               /* 0xff12601bffff0000 : 0x00000001ffXXXXXX */
> > > > > > > +#define SPEC_PREFIX (0xff12601bffff0000)
> > > > > > > +#define INT_ID_MASK (0x00000001ff000000)
> > > > > > > +               uint64_t g_prefix = 
> > > > > > > cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.prefix);
> > > > > > > +               uint64_t g_interface_id = 
> > > > > > > cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.interface_id);
> > > > > > > +               uint64_t rcv_prefix = 
> > > > > > > cl_ntoh64(p_recvd_mgid->unicast.prefix);
> > > > > > > +               uint64_t rcv_interface_id = 
> > > > > > > cl_ntoh64(p_recvd_mgid->unicast.interface_id);
> > > > > > > +
> > > > > > > +               if (rcv_prefix == SPEC_PREFIX
> > > > > > > +                       &&
> > > > > > > +                       (rcv_interface_id & INT_ID_MASK) == 
> > > > > > > INT_ID_MASK) {
> > > > > > > +
> > > > > > > +                       if ((g_prefix == rcv_prefix)
> > > > > > > +                               &&
> > > > > > > +                               (g_interface_id & INT_ID_MASK) ==
> > > > > > > +                                       (rcv_interface_id & 
> > > > > > > INT_ID_MASK)
> > > > > > > +                               ) {
> > > > > > > +                               osm_log(sa->p_log, OSM_LOG_INFO,
> > > > > > > +                                       "Special Case Mcast Join 
> > > > > > > for MGID "
> > > > > > > +                                       " MGID 0x%016"PRIx64" : 
> > > > > > > 0x%016"PRIx64"\n",
> > > > > > > +                                       rcv_prefix, 
> > > > > > > rcv_interface_id);
> > > > > > > +                               goto match;
> > > > > > > +                       }
> > > > > > > +               }
> > > > > > >                return;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +match:
> > > > > > >
> > > > > > >  #if 0
> > > > > > >        for (i = 0;
> > > > > > > --
> > > > > > > 1.5.1
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 11 Jan 2008 22:04:56 -0800
> > > > > > > Ira Weiny <[EMAIL PROTECTED]> wrote:
> > > > > > >
> > > > > > > > Ok,
> > > > > > > >
> > > > > > > > I found my own answer.  Sorry for the spam.
> > > > > > > >
> > > > > > > > http://lists.openfabrics.org/pipermail/general/2006-November/029617.html
> > > > > > > >
> > > > > > > > Sorry,
> > > > > > > > Ira
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, 11 Jan 2008 19:36:57 -0800
> > > > > > > > Ira Weiny <[EMAIL PROTECTED]> wrote:
> > > > > > > >
> > > > > > > > > I don't really understand the innerworkings of IPoIB so 
> > > > > > > > > forgive me if this is a
> > > > > > > > > really stupid question but:
> > > > > > > > >
> > > > > > > > >    Is it a bug that there is a Multicast group created for 
> > > > > > > > > every node in our
> > > > > > > > >    clusters?
> > > > > > > > >
> > > > > > > > > If not a bug why is this done?  We just tried to boot on a 
> > > > > > > > > 1151 node cluster
> > > > > > > > > and opensm is complaining there are not enough multicast 
> > > > > > > > > groups.
> > > > > > > > >
> > > > > > > > >    Jan 11 18:30:42 728984 [40C05960] -> __get_new_mlid: ERR 
> > > > > > > > > 1B23: All available:1024 mlids are taken
> > > > > > > > >    Jan 11 18:30:42 729050 [40C05960] -> 
> > > > > > > > > osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
> > > > > > > > >    Jan 11 18:30:42 730647 [40401960] -> __get_new_mlid: ERR 
> > > > > > > > > 1B23: All available:1024 mlids are taken
> > > > > > > > >    Jan 11 18:30:42 730691 [40401960] -> 
> > > > > > > > > osm_mcmr_rcv_create_new_mgrp: ERR 1B19: __get_new_mlid failed
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Here is the output from my small test cluster:  
> > > > > > > > > (ibnodesinmcast uses saquery a
> > > > > > > > > couple of times to print this nice report.)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >    19:17:24 > whatsup
> > > > > > > > >    up:   9: wopr[0-7],wopri
> > > > > > > > >    down: 0:
> > > > > > > > >    [EMAIL PROTECTED]:/tftpboot/images
> > > > > > > > >    19:25:03 > ibnodesinmcast -g
> > > > > > > > >    0xC000 (0xff12401bffff0000 : 0x00000000ffffffff)
> > > > > > > > >       In  9: wopr[0-7],wopri
> > > > > > > > >       Out 0: 0
> > > > > > > > >    0xC001 (0xff12401bffff0000 : 0x0000000000000001)
> > > > > > > > >       In  9: wopr[0-7],wopri
> > > > > > > > >       Out 0: 0
> > > > > > > > >    0xC002 (0xff12601bffff0000 : 0x00000001ff2265ed)
> > > > > > > > >       In  1: wopr3
> > > > > > > > >       Out 8: wopr[0-2,4-7],wopri
> > > > > > > > >    0xC003 (0xff12601bffff0000 : 0x0000000000000001)
> > > > > > > > >       In  9: wopr[0-7],wopri
> > > > > > > > >       Out 0: 0
> > > > > > > > >    0xC004 (0xff12601bffff0000 : 0x00000001ff222729)
> > > > > > > > >       In  1: wopr4
> > > > > > > > >       Out 8: wopr[0-3,5-7],wopri
> > > > > > > > >    0xC005 (0xff12601bffff0000 : 0x00000001ff219e65)
> > > > > > > > >       In  1: wopri
> > > > > > > > >       Out 8: wopr[0-7]
> > > > > > > > >    0xC006 (0xff12601bffff0000 : 0x00000001ff00232d)
> > > > > > > > >       In  1: wopr6
> > > > > > > > >       Out 8: wopr[0-5,7],wopri
> > > > > > > > >    0xC007 (0xff12601bffff0000 : 0x00000001ff002325)
> > > > > > > > >       In  1: wopr7
> > > > > > > > >       Out 8: wopr[0-6],wopri
> > > > > > > > >    0xC008 (0xff12601bffff0000 : 0x00000001ff228d35)
> > > > > > > > >       In  1: wopr1
> > > > > > > > >       Out 8: wopr[0,2-7],wopri
> > > > > > > > >    0xC009 (0xff12601bffff0000 : 0x00000001ff2227f1)
> > > > > > > > >       In  1: wopr2
> > > > > > > > >       Out 8: wopr[0-1,3-7],wopri
> > > > > > > > >    0xC00A (0xff12601bffff0000 : 0x00000001ff219ef1)
> > > > > > > > >       In  1: wopr0
> > > > > > > > >       Out 8: wopr[1-7],wopri
> > > > > > > > >    0xC00B (0xff12601bffff0000 : 0x00000001ff0021e9)
> > > > > > > > >       In  1: wopr5
> > > > > > > > >       Out 8: wopr[0-4,6-7],wopri
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Each of these MGIDS of the prefix (0xff12601bffff0000) have 
> > > > > > > > > just one node in
> > > > > > > > > them and represent an ipv6 address.  Could you turn off ipv6 
> > > > > > > > > with the latest
> > > > > > > > > IPoIB?
> > > > > > > > >
> > > > > > > > > In a bind,
> > > > > > > > > Ira
> > > > > > > > > _______________________________________________
> > > > > > > > > general mailing list
> > > > > > > > > [email protected]
> > > > > > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > > > > > > >
> > > > > > > > > To unsubscribe, please visit 
> > > > > > > > > http://openib.org/mailman/listinfo/openib-general
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > general mailing list
> > > > > > > [email protected]
> > > > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > > > > >
> > > > > > > To unsubscribe, please visit 
> > > > > > > http://openib.org/mailman/listinfo/openib-general
> > > > > > >
> > > > > > >
> > > > > _______________________________________________
> > > > > general mailing list
> > > > > [email protected]
> > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > > > 
> > > > > To unsubscribe, please visit 
> > > > > http://openib.org/mailman/listinfo/openib-general
> > _______________________________________________
> > general mailing list
> > [email protected]
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to