On 16:35 Tue 16 Oct , Edward Mascarenhas wrote: > > Has anyone seen issues with running OpenSM on large (1500+ nodes) > clusters? > > We are seeing 1000s of the following message in the system log > > __osm_sa_mad_ctrl_process: Dropping MAD since the dispatcher is > already overloaded with 6736 messages and queue time of:10006[msec]
I guess you see this during fabric bringup when SA processor is not available yet. Which version of OpenSM you are using - we did some improvements in this area in recent versions (partially in OFED-1.2)? > It seems like a huge number of datagrams are being generated resulting > in increased time to bring up the fabric. > > Is there a threshold of cluster size beyond which we are likely to see > these messages. > > How many MADs are generated during bring up? A lot :). Exact number will depend on exact topology and requested configuration. Could you send us output of ibnetdiscover? > What is the largest cluster size for which OpenSM has been tried by > others? I hope others will answer. Largest cluster known for me was Thunderbird (4480 nodes), there are some details: http://openfabrics.org/archives/nov2006sc/ofa_devel_111606.pdf Sasha _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
