Charles,

Ira Weiny wrote:
Charles,

Here at LLNL we have been running OpenSM for some time.  Thus far we are very
happy with it's performance.  Our largest cluster is 1152 nodes and OpenSM can
bring it up (not counting boot time) in less than a minute.

OpenSM is successfully running on some large clusters with 4-5K nodes.
It takes about 2-3 minutes to bring up such clusters.

Here are some details.

We are running v3.1.10 of OpenSM with some minor modifications (mostly patches
which have been submitted upstream and been accepted by Sasha but are not yet
in a release.)

Our clusters are all Fat-tree topologies.

We have a node which is more or less dedicated to running OpenSM.  We have some
other monitoring software running on it, but OpenSM can utilize the CPU/Memory
if it needs to.

   A) On our large clusters this node is a 4 socket, dual core (8 cores
   total) Opteron running at 2.4Gig with 16Gig of memory.  I don't believe
   OpenSM needs this much but the nodes were built all the same so this is
   what it got.

   B) On one of our smaller clusters (128 nodes) OpenSM is running on a
   dual socket, single core (2 core) 2.4Gig Opteron nodes with 2Gig of
   memory.  We have not seen any issues with this cluster and OpenSM.

We run with the up/down algorithm, ftree has not panned out for us yet.  I
can't say how that would compare to the Cisco algorithms.

If the cluster topology is fat-tree, then there is a ftree and up/down routing.
Ftree would be a good choice if you need LMC=0 (plus if the topology complies
with certain fat-tree rules). For any other tree, or for LMC>0, up/down should
work.

-- Yevgeny

In short OpenSM should work just fine on your cluster.

Hope this helps,
Ira


On Tue, 27 May 2008 11:15:14 -0400
Charles Taylor <[EMAIL PROTECTED]> wrote:

We have a 400 node IB cluster. We are running an embedded SM in failover mode on our TS270/Cisco7008 core switches. Lately we have been seeing problems with LID assignment when rebooting nodes (see log messages below). It is also taking far too long for LIDS to be assigned as it takes on the order of minutes for the ports to transition to "ACTIVE".

This seems like a bug to us and we are considering switching to OpenSM on a host. I'm wondering about experience with running OpenSM for medium to large clusters (Fat Tree) and what resources (memory/cpu) we should plan on for the host node.

Thanks,

Charlie Taylor
UF HPC Center

May 27 14:14:10 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:14:10 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover
the subnet
May 27 14:14:13 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM OUT_OF_SERVICE
trap for GID=fe:80:00:00:00:00:00:00:00:02:c9:02:00:21:4b:59
May 27 14:14:13 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:256]: An existing IB
node GUID 00:02:c9:02:00:21:4b:59 LID 194 was removed
May 27 14:14:14 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM DELETE_MC_GROUP
trap for GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:21:4b:59
May 27 14:14:14 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1503]: Topology
changed
May 27 14:14:14 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by
discovering removed ports
May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events
require sweep
May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:16:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover
the subnet
May 27 14:16:28 topspin-270sc ib_sm.x[812]: [ib_sm_discovery.c:1009]: no
routing required for port guid 00:02:c9:02:00:21:4b:59, lid 194
May 27 14:16:30 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1503]: Topology
changed
May 27 14:16:30 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by
discovering new ports
May 27 14:16:30 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by
multicast membership change
May 27 14:16:30 topspin-270sc ib_sm.x[812]: [ib_sm_assign.c:588]: Force port to
go down due to LID conflict, node - GUID=00:02:c9:02:00:21:4b:58, port=1
May 27 14:18:42 topspin-270sc ib_sm.x[819]: [ib_sm_bringup.c:562]: Program port
state, node=00:02:c9:02:00:21:4b:58, port= 16, current state 2, neighbor
node=00:02:c9:02:00:21:4b:58, port= 1, current state 2
May 27 14:18:42 topspin-270sc ib_sm.x[819]: [ib_sm_bringup.c:733]: Failed to negotiate MTU, op_vl for node=00:02:c9:02:00:21:4b:58, port= 1, mad status 0x1c May 27 14:18:42 topspin-270sc ib_sm.x[803]: [INFO]: Generate SM IN_SERVICE trap
for GID=fe:80:00:00:00:00:00:00:00:02:c9:02:00:21:4b:59
May 27 14:18:42 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:144]: A new IB node
00:02:c9:02:00:21:4b:59 was discovered and assigned LID 0
May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events
require sweep
May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:18:43 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover
the subnet
May 27 14:18:46 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: No topology
change
May 27 14:18:46 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by
previous GET/SET operation failures
May 27 14:18:46 topspin-270sc ib_sm.x[816]: [ib_sm_assign.c:545]: Reassigning LID, node - GUID=00:02:c9:02:00:21:4b:58, port=1, new LID=411, curr LID=0 May 27 14:18:46 topspin-270sc ib_sm.x[816]: [ib_sm_assign.c:588]: Force port to
go down due to LID conflict, node - GUID=00:02:c9:02:00:21:4b:58, port=1
May 27 14:18:46 topspin-270sc ib_sm.x[816]: [ib_sm_assign.c:635]: Clean up SA
resources for port forced down due to LID conflict, node -
GUID=00:02:c9:02:00:21:4b:58, port=1
May 27 14:18:47 topspin-270sc ib_sm.x[803]: [ib_sm_assign.c:667]: cleaning DB
for guid 00:02:c9:02:00:21:4b:59, lid 194
May 27 14:18:47 topspin-270sc ib_sm.x[803]: [ib_sm_routing.c:2936]:
_ib_smAllocSubnet: initRate= 4
May 27 14:18:47 topspin-270sc last message repeated 23 times
May 27 14:18:47 topspin-270sc ib_sm.x[803]: [INFO]: Different capacity links
detected in the network
May 27 14:21:01 topspin-270sc ib_sm.x[820]: [ib_sm_bringup.c:516]: Active port(s) now in INIT state node=00:02:c9:02:00:21:4b:58, port=16, state=2,
neighbor node=00:02:c9:02:00:21:4b:58, port=1, state=2
May 27 14:21:01 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events
require sweep
May 27 14:21:01 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:21:01 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1320]: Rediscover
the subnet
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: No topology
change
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:525]: IB node
00:06:6a:00:d9:00:04:5d port 16 is INIT state
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by
some ports in INIT state
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by
previous GET/SET operation failures
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [ib_sm_routing.c:2936]:
_ib_smAllocSubnet: initRate= 4
May 27 14:21:05 topspin-270sc last message repeated 23 times
May 27 14:21:05 topspin-270sc ib_sm.x[803]: [INFO]: Different capacity links
detected in the network
May 27 14:23:19 topspin-270sc ib_sm.x[817]: [ib_sm_bringup.c:562]: Program port
state, node=00:02:c9:02:00:21:4b:58, port= 16, current state 2, neighbor
node=00:02:c9:02:00:21:4b:58, port= 1, current state 2
May 27 14:23:24 topspin-270sc ib_sm.x[823]: [INFO]: Generate SM CREATE_MC_GROUP
trap for GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:21:4b:59
May 27 14:23:24 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events
require sweep
May 27 14:23:24 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:23:26 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: No topology
change
May 27 14:23:26 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by
multicast membership change
May 27 14:23:33 topspin-270sc ib_sm.x[826]: [INFO]: Standby SM guid
00:05:ad:00:00:02:3c:60, is no longer synchronized with Master SM
May 27 14:25:39 topspin-270sc ib_sm.x[826]: [INFO]: Initialize a backup session
with Standby SM guid 00:05:ad:00:00:02:3c:60
May 27 14:25:39 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1875]: async events
require sweep
May 27 14:25:39 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:25:39 topspin-270sc ib_sm.x[826]: [INFO]: Standby SM guid
00:05:ad:00:00:02:3c:60, started synchronizing with Master SM
May 27 14:25:42 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: No topology
change
May 27 14:25:42 topspin-270sc ib_sm.x[803]: [INFO]: Configuration caused by
multicast membership change
May 27 14:25:43 topspin-270sc ib_sm.x[826]: [INFO]: Master SM DB synchronized
with Standby SM guid 00:05:ad:00:00:02:3c:60
May 27 14:25:43 topspin-270sc ib_sm.x[826]: [INFO]: Master SM DB synchronized
with all designated backup SMs
May 27 14:28:04 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1914]:
**********************  NEW SWEEP ********************
May 27 14:28:06 topspin-270sc ib_sm.x[803]: [ib_sm_sweep.c:1516]: No topology
change

On May 23, 2008, at 2:20 PM, Steve Wise wrote:

Or Gerlitz wrote:
Steve Wise wrote:
Are we sure we need to expose this to the user?
I believe this is the way to go if we want to let smart ULPs generate new rkey/stag per mapping. Simpler ULPs could then just put the same value for each map associated with the same mr.

Or.

How should I add this to the API?

Perhaps we just document the format of an rkey in the struct ib_mr. Thus the app would do this to change the key before posting the fast_reg_mr wr (coded to be explicit, not efficient):

u8 newkey;
u32 newrkey;

newkey = 0xaa;
newrkey = (mr->rkey & 0xffffff00) | newkey;
mr->rkey = newrkey
wr.wr.fast_reg.mr = mr;
...


Note, this assumes mr->rkey is in host byte order (I think the linux rdma code assumes this in other places too).


Steve.









_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to