On Jan 10, 2008, at 10:29 AM, Josh Hursey wrote:
I met with Joseph Cottam (Grad student in my lab at IU) yesterday
about MTT visualization. He is working on some new visualization
techniques and wants to apply them to the MTT dataset.
Awesome.
Since we are ramping up to a v1.3 release we want to visualization to
support this effort. So we want to make sure that the visualization
will meet the development community's needs. We should probably ask
the devel-core list, but I thought I would start some of the
discussion here to make sure I am asking the right questions of the
group.
Sounds reasonable.
After a first go-round here, we might want to have a conversation with
the OMPI RM's to get their input - that would still be a small group
to get targeted feedback on these questions.
To start I have some basic questions:
- How does Open MPI determine that it is stable enough to release?
I personally have a Magic 8 Ball on my desk that I consult frequently
for questions like this. ;-)
It's a mix of many different metrics, actually:
- stuff unrelated to MTT results:
- how many trac tickets are open against that release and do we care
- how urgent are the bug fixes that are included
- external requirements (e.g., get an OMPI release out to meet the
OFED release schedule)
- ...and probably others
- related to MTT results
- "good" coverage on platforms (where "platform" = host arch, OS,
OS version, compiler, compiler version, MCA params, interconnect, and
scheduler -- note that some of these are orthogonal from each other...)
- the only failures and timeouts we have are a) repeatable, b)
consistent across multiple organizations (if relevant), and deemed to
be acceptable
- What dimensions of testing are most/least important (i.e.,
platforms, compilers, feature sets, scale, ...)?
This is a hard question. :-\ I listed several dimensions above:
- host architecture
- OS
- OS version
- compiler
- compiler version
- MCA parameters used
- interconnect
- scheduler
Here's some more:
- number of processes tested
- layout of processes (by node, by proc, ...etc.)
I don't quite know how to order those in terms of priority. :-\
- What other questions would be useful to answer with regard to
testing (thinking completely outside of the box)?
* Example: Are we testing a specific platform/configuration set
too much/too little?
This is a great question.
I would love to be able to configure this question -- e.g., are we
testing some MCA params too much/too little.
The performance stuff can always be visualized better, especially over
time. One idea is expressed in https://svn.open-mpi.org/trac/mtt/ticket/330
.
I also very much like the ideas in https://svn.open-mpi.org/trac/mtt/ticket/236
and https://svn.open-mpi.org/trac/mtt/ticket/302 (302 is not
expressed as a visualization issue, but it could be -- you can imagine
a tree-based display showing the relationships between phase results,
perhaps even incorporated with a timeline -- that would be awesome).
Here's a whacky idea -- can our MTT data be combined with SCM data
(SVN, in this case) to answer questions like:
- what parts of the code are the most troublesome? i.e., when this
part of the code changes, these tests tend to break
- what tests seem to be related to what parts of the OMPI code base?
- who / what SVN commit(s) seemed to cause specific tests to break?
(this seems like a longer-term set of questions, but I thought I'd
bring it up...)
- Other questions you think we should pose to the group?
We are currently feeling out the domain of possibilities, but hope to
start doing some sketching some ideas in another week or so. This work
should proceed fairly quickly since we are targeting a paper about
this for the ACM Symposium on Software Visualization (http://www.softvis.org/
) which is due in early April. How is that for expecting success :)
Awesome.
--
Jeff Squyres
Cisco Systems