Has anyone seen issues with running OpenSM on large (1500+ nodes) clusters?
We are seeing 1000s of the following message in the system log __osm_sa_mad_ctrl_process: Dropping MAD since the dispatcher is already overloaded with 6736 messages and queue time of:10006[msec] It seems like a huge number of datagrams are being generated resulting in increased time to bring up the fabric. Is there a threshold of cluster size beyond which we are likely to see these messages. How many MADs are generated during bring up? What is the largest cluster size for which OpenSM has been tried by others? Thanks, Edward _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
