On 11:38 Wed 17 Oct , Ira Weiny wrote: > On Tue, 16 Oct 2007 16:35:38 -0700 > Edward Mascarenhas <[EMAIL PROTECTED]> wrote: > > > > > Has anyone seen issues with running OpenSM on large (1500+ nodes) > > clusters? > > > > We are seeing 1000s of the following message in the system log > > > > __osm_sa_mad_ctrl_process: Dropping MAD since the dispatcher is > > already overloaded with 6736 messages and queue time of:10006[msec] > > > > It seems like a huge number of datagrams are being generated resulting > > in increased time to bring up the fabric. > > > > Is there a threshold of cluster size beyond which we are likely to see > > these messages. > > > > How many MADs are generated during bring up? > > > > What is the largest cluster size for which OpenSM has been tried by > > others? > > > > We have atlas running with 1152 nodes. OpenSM is able to route it with > up/down > routing in ~2min.
2min is a lot for OpenSM with up/down. Is it pure OpenSM time or from bring-up power-on? Sasha > We don't see messages like you state above. But we have been using the OpenSM > from OFED 1.2. > > Hope this helps, > Ira > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
