I don't really understand the innerworkings of IPoIB so forgive me if this is a
really stupid question but:

   Is it a bug that there is a Multicast group created for every node in our
   clusters?

If not a bug why is this done?  We just tried to boot on a 1151 node cluster
and opensm is complaining there are not enough multicast groups.

   Jan 11 18:30:42 728984 [40C05960] -> __get_new_mlid: ERR 1B23: All 
available:1024 mlids are taken
   Jan 11 18:30:42 729050 [40C05960] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
__get_new_mlid failed
   Jan 11 18:30:42 730647 [40401960] -> __get_new_mlid: ERR 1B23: All 
available:1024 mlids are taken
   Jan 11 18:30:42 730691 [40401960] -> osm_mcmr_rcv_create_new_mgrp: ERR 1B19: 
__get_new_mlid failed


Here is the output from my small test cluster:  (ibnodesinmcast uses saquery a
couple of times to print this nice report.)


   19:17:24 > whatsup
   up:   9: wopr[0-7],wopri
   down: 0:
   [EMAIL PROTECTED]:/tftpboot/images
   19:25:03 > ibnodesinmcast -g
   0xC000 (0xff12401bffff0000 : 0x00000000ffffffff)
      In  9: wopr[0-7],wopri
      Out 0: 0
   0xC001 (0xff12401bffff0000 : 0x0000000000000001)
      In  9: wopr[0-7],wopri
      Out 0: 0
   0xC002 (0xff12601bffff0000 : 0x00000001ff2265ed)
      In  1: wopr3
      Out 8: wopr[0-2,4-7],wopri
   0xC003 (0xff12601bffff0000 : 0x0000000000000001)
      In  9: wopr[0-7],wopri
      Out 0: 0
   0xC004 (0xff12601bffff0000 : 0x00000001ff222729)
      In  1: wopr4
      Out 8: wopr[0-3,5-7],wopri
   0xC005 (0xff12601bffff0000 : 0x00000001ff219e65)
      In  1: wopri
      Out 8: wopr[0-7]
   0xC006 (0xff12601bffff0000 : 0x00000001ff00232d)
      In  1: wopr6
      Out 8: wopr[0-5,7],wopri
   0xC007 (0xff12601bffff0000 : 0x00000001ff002325)
      In  1: wopr7
      Out 8: wopr[0-6],wopri
   0xC008 (0xff12601bffff0000 : 0x00000001ff228d35)
      In  1: wopr1
      Out 8: wopr[0,2-7],wopri
   0xC009 (0xff12601bffff0000 : 0x00000001ff2227f1)
      In  1: wopr2
      Out 8: wopr[0-1,3-7],wopri
   0xC00A (0xff12601bffff0000 : 0x00000001ff219ef1)
      In  1: wopr0
      Out 8: wopr[1-7],wopri
   0xC00B (0xff12601bffff0000 : 0x00000001ff0021e9)
      In  1: wopr5
      Out 8: wopr[0-4,6-7],wopri


Each of these MGIDS of the prefix (0xff12601bffff0000) have just one node in
them and represent an ipv6 address.  Could you turn off ipv6 with the latest
IPoIB?

In a bind,
Ira
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to