Re: [ofa-general] Running OpenSM on large clusters

Ira Weiny Wed, 17 Oct 2007 11:39:01 -0700

On Tue, 16 Oct 2007 16:35:38 -0700
Edward Mascarenhas <[EMAIL PROTECTED]> wrote:


> 
> Has anyone seen issues with running OpenSM on large (1500+ nodes) 
> clusters?
> 
> We are seeing 1000s of the following message in the system log
> 
> __osm_sa_mad_ctrl_process: Dropping MAD since the dispatcher is 
> already overloaded with 6736 messages and queue time of:10006[msec]
> 
> It seems like a huge number of datagrams are being generated resulting 
> in increased time to bring up the fabric. 
> 
> Is there a threshold of cluster size beyond which we are likely to see 
> these messages.
> 
> How many MADs are generated during bring up?
> 
> What is the largest cluster size for which OpenSM has been tried by 
> others?
> 

We have atlas running with 1152 nodes.  OpenSM is able to route it with up/down
routing in ~2min.

We don't see messages like you state above.  But we have been using the OpenSM
from OFED 1.2.

Hope this helps,
Ira
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Running OpenSM on large clusters

Reply via email to