Re: [ofa-general] OpenSM?

Yevgeny Kliteynik Tue, 27 May 2008 14:32:08 -0700

Charles,

Ira Weiny wrote:

Charles,


Here at LLNL we have been running OpenSM for some time.  Thus far we are very
happy with it's performance.  Our largest cluster is 1152 nodes and OpenSM can
bring it up (not counting boot time) in less than a minute.


OpenSM is successfully running on some large clusters with 4-5K nodes.
It takes about 2-3 minutes to bring up such clusters.

Here are some details.

We are running v3.1.10 of OpenSM with some minor modifications (mostly patches
which have been submitted upstream and been accepted by Sasha but are not yet
in a release.)

Our clusters are all Fat-tree topologies.

We have a node which is more or less dedicated to running OpenSM.  We have some
other monitoring software running on it, but OpenSM can utilize the CPU/Memory
if it needs to.

   A) On our large clusters this node is a 4 socket, dual core (8 cores
   total) Opteron running at 2.4Gig with 16Gig of memory.  I don't believe
   OpenSM needs this much but the nodes were built all the same so this is
   what it got.

   B) On one of our smaller clusters (128 nodes) OpenSM is running on a
   dual socket, single core (2 core) 2.4Gig Opteron nodes with 2Gig of
   memory.  We have not seen any issues with this cluster and OpenSM.

We run with the up/down algorithm, ftree has not panned out for us yet.  I
can't say how that would compare to the Cisco algorithms.


If the cluster topology is fat-tree, then there is a ftree and up/down routing.
Ftree would be a good choice if you need LMC=0 (plus if the topology complies
with certain fat-tree rules). For any other tree, or for LMC>0, up/down should
work.

-- Yevgeny

In short OpenSM should work just fine on your cluster.

Hope this helps,
Ira


On Tue, 27 May 2008 11:15:14 -0400
Charles Taylor <[EMAIL PROTECTED]> wrote:
We have a 400 node IB cluster. We are running an embedded SM infailover mode on our TS270/Cisco7008 core switches. Lately we havebeen seeing problems with LID assignment when rebooting nodes (see logmessages below). It is also taking far too long for LIDS to beassigned as it takes on the order of minutes for the ports totransition to "ACTIVE".
This seems like a bug to us and we are considering switching toOpenSM on a host. I'm wondering about experience with runningOpenSM for medium to large clusters (Fat Tree) and what resources(memory/cpu) we should plan on for the host node.
Thanks,

Charlie Taylor
UF HPC Center

May 27 14:14:10 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:14:10 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]:Rediscover
the subnet
May 27 14:14:13 topspin-270sc ib_sm.x[803]: [INFO]: Generate SMOUT_OF_SERVICE
trap for GID=fe:80:00:00:00:00:00:00:00:02:c9:02:00:21:4b:59
May 27 14:14:13 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:256]: Anexisting IB
node GUID 00:02:c9:02:00:21:4b:59 LID 194 was removed
May 27 14:14:14 topspin-270sc ib_sm.x[803]: [INFO]: Generate SMDELETE_MC_GROUP
trap for GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:21:4b:59
May 27 14:14:14 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1503]:Topology
changed
May 27 14:14:14 topspin-270sc ib_sm.x[803]: [INFO]: Configurationcaused by
discovering removed ports
May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]:async events
require sweep
May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]:Rediscover
the subnet
May 27 14:16:28 topspin-270sc ib_sm.x[812]: [ib_sm_discovery.c:1009]: no
routing required for port guid 00:02:c9:02:00:21:4b:59, lid 194
May 27 14:16:30 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1503]:Topology
changed
May 27 14:16:30 topspin-270sc ib_sm.x[803]: [INFO]: Configurationcaused by
discovering new ports
May 27 14:16:30 topspin-270sc ib_sm.x[803]: [INFO]: Configurationcaused by
multicast membership change
May 27 14:16:30 topspin-270sc ib_sm.x[812]: [ib_sm_assign.c:588]:Force port to
go down due to LID conflict, node - GUID=00:02:c9:02:00:21:4b:58, port=1
May 27 14:18:42 topspin-270sc ib_sm.x[819]: [ib_sm_bringup.c:562]:Program port
state, node=00:02:c9:02:00:21:4b:58, port= 16, current state 2, neighbor
node=00:02:c9:02:00:21:4b:58, port= 1, current state 2
May 27 14:18:42 topspin-270sc ib_sm.x[819]: [ib_sm_bringup.c:733]:Failed tonegotiate MTU, op_vl for node=00:02:c9:02:00:21:4b:58, port= 1, madstatus 0x1cMay 27 14:18:42 topspin-270sc ib_sm.x[803]: [INFO]: Generate SMIN_SERVICE trap
for GID=fe:80:00:00:00:00:00:00:00:02:c9:02:00:21:4b:59
May 27 14:18:42 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:144]: A newIB node
00:02:c9:02:00:21:4b:59 was discovered and assigned LID 0
May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]:async events
require sweep
May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]:Rediscover
the subnet
May 27 14:18:46 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: Notopology
change
May 27 14:18:46 topspin-270sc ib_sm.x[803]: [INFO]: Configurationcaused by
previous GET/SET operation failures
May 27 14:18:46 topspin-270sc ib_sm.x[816]: [ib_sm_assign.c:545]:ReassigningLID, node - GUID=00:02:c9:02:00:21:4b:58, port=1, new LID=411, currLID=0May 27 14:18:46 topspin-270sc ib_sm.x[816]: [ib_sm_assign.c:588]:Force port to
go down due to LID conflict, node - GUID=00:02:c9:02:00:21:4b:58, port=1
May 27 14:18:46 topspin-270sc ib_sm.x[816]: [ib_sm_assign.c:635]:Clean up SA
resources for port forced down due to LID conflict, node -
GUID=00:02:c9:02:00:21:4b:58, port=1
May 27 14:18:47 topspin-270sc ib_sm.x[803]: [ib_sm_assign.c:667]:cleaning DB
for guid 00:02:c9:02:00:21:4b:59, lid 194
May 27 14:18:47 topspin-270sc ib_sm.x[803]: [ib_sm_routing.c:2936]:
_ib_smAllocSubnet: initRate= 4
May 27 14:18:47 topspin-270sc last message repeated 23 times
May 27 14:18:47 topspin-270sc ib_sm.x[803]: [INFO]: Different capacitylinks
detected in the network
May 27 14:21:01 topspin-270sc ib_sm.x[820]: [ib_sm_bringup.c:516]:Activeport(s) now in INIT state node=00:02:c9:02:00:21:4b:58, port=16,state=2,
neighbor node=00:02:c9:02:00:21:4b:58, port=1, state=2
May 27 14:21:01 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]:async events
require sweep
May 27 14:21:01 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:21:01 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]:Rediscover
the subnet
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: Notopology
change
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:525]: IB node
00:06:6a:00:d9:00:04:5d port 16 is INIT state
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [INFO]: Configurationcaused by
some ports in INIT state
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [INFO]: Configurationcaused by
previous GET/SET operation failures
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [ib_sm_routing.c:2936]:
_ib_smAllocSubnet: initRate= 4
May 27 14:21:05 topspin-270sc last message repeated 23 times
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [INFO]: Different capacitylinks
detected in the network
May 27 14:23:19 topspin-270sc ib_sm.x[817]: [ib_sm_bringup.c:562]:Program port
state, node=00:02:c9:02:00:21:4b:58, port= 16, current state 2, neighbor
node=00:02:c9:02:00:21:4b:58, port= 1, current state 2
May 27 14:23:24 topspin-270sc ib_sm.x[823]: [INFO]: Generate SMCREATE_MC_GROUP
trap for GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:21:4b:59
May 27 14:23:24 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]:async events
require sweep
May 27 14:23:24 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:23:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: Notopology
change
May 27 14:23:26 topspin-270sc ib_sm.x[803]: [INFO]: Configurationcaused by
multicast membership change
May 27 14:23:33 topspin-270sc ib_sm.x[826]: [INFO]: Standby SM guid
00:05:ad:00:00:02:3c:60, is no longer synchronized with Master SM
May 27 14:25:39 topspin-270sc ib_sm.x[826]: [INFO]: Initialize abackup session
with Standby SM guid 00:05:ad:00:00:02:3c:60
May 27 14:25:39 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]:async events
require sweep
May 27 14:25:39 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:25:39 topspin-270sc ib_sm.x[826]: [INFO]: Standby SM guid
00:05:ad:00:00:02:3c:60, started synchronizing with Master SM
May 27 14:25:42 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: Notopology
change
May 27 14:25:42 topspin-270sc ib_sm.x[803]: [INFO]: Configurationcaused by
multicast membership change
May 27 14:25:43 topspin-270sc ib_sm.x[826]: [INFO]: Master SM DBsynchronized
with Standby SM guid 00:05:ad:00:00:02:3c:60
May 27 14:25:43 topspin-270sc ib_sm.x[826]: [INFO]: Master SM DBsynchronized
with all designated backup SMs
May 27 14:28:04 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:28:06 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: Notopology
change

On May 23, 2008, at 2:20 PM, Steve Wise wrote:
Or Gerlitz wrote:
Steve Wise wrote:
Are we sure we need to expose this to the user?
I believe this is the way to go if we want to let smart ULPsgenerate new rkey/stag per mapping. Simpler ULPs could then justput the same value for each map associated with the same mr.
Or.
How should I add this to the API?
Perhaps we just document the format of an rkey in the struct ib_mr.Thus the app would do this to change the key before posting thefast_reg_mr wr (coded to be explicit, not efficient):
u8 newkey;
u32 newrkey;

newkey = 0xaa;
newrkey = (mr->rkey & 0xffffff00) | newkey;
mr->rkey = newrkey
wr.wr.fast_reg.mr = mr;
...
Note, this assumes mr->rkey is in host byte order (I think the linuxrdma code assumes this in other places too).
Steve.









_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] OpenSM?

Reply via email to