On Tue, 16 Oct 2007 16:35:38 -0700 Edward Mascarenhas <[EMAIL PROTECTED]> wrote:
> > Has anyone seen issues with running OpenSM on large (1500+ nodes) > clusters? > > We are seeing 1000s of the following message in the system log > > __osm_sa_mad_ctrl_process: Dropping MAD since the dispatcher is > already overloaded with 6736 messages and queue time of:10006[msec] > > It seems like a huge number of datagrams are being generated resulting > in increased time to bring up the fabric. > > Is there a threshold of cluster size beyond which we are likely to see > these messages. > > How many MADs are generated during bring up? > > What is the largest cluster size for which OpenSM has been tried by > others? > We have atlas running with 1152 nodes. OpenSM is able to route it with up/down routing in ~2min. We don't see messages like you state above. But we have been using the OpenSM from OFED 1.2. Hope this helps, Ira _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
