Re: [OMPI devel] RFC: Diagnostoc framework for MPI

Eugene Loh Tue, 26 May 2009 19:16:33 -0400

Nadia Derbey wrote:

What: Warn the administrator when unusual events are occurring too
frequently.


Why: Such unusual events might be the symptom of some problem that can
easily be fixed (by a better tuning, for example)

Before Sun HPC ClusterTools adopted the Open MPI code base (that is, CT6and earlier), there was some performance analysis support calledMPProf. Seehttp://docs.sun.com/source/819-4134-10/profile.html#pgfId-999249 . Thekey characteristic was supposed to be that it would be very easy touse: set an environment variable before running; run a report generatorafterwards; report is self explanatory; data volumes were relativelysmall and so easy to manage.

One part in particular seemed germane to your RFC: advice onimplementation-specific environment variables. Seehttp://docs.sun.com/source/819-4134-10/profile.html#pgfId-1000209 . SunMPI had instrumentation embedded in it that looked for various"performance conditions". Then, in post processing, the reportgenerator would translate that information into user-actionablefeedback. At least, that was the concept. The idea would be that alluser feedback should include:

*) a brief explanation of what happened ("you ran out of postboxes...see Appendix A.1.b.23 of user guide if you really dare to understandwhat this means")*) an estimate of how important this is ("we think this cost you 10%performance")*) a concise description of what to do to improve performance anddiscussion of ramifications ("set the environment variableMPI_NUMPOSTBOX to 256 and rerun, this will cost about 50 Mbyte morememory per process")

The feedback need not be limited to environment variables orimplementation-specific conditions. E.g., perhaps one could detect whenMPI_Ssend is used in place of MPI_Send and how much performance(unneeded synchronization) that cost.

Re: [OMPI devel] RFC: Diagnostoc framework for MPI

Reply via email to