A user noticed a specific change that we made between 1.4.2 and 1.4.3:

    https://svn.open-mpi.org/trac/ompi/changeset/23448

which is from CMR https://svn.open-mpi.org/trac/ompi/ticket/2489, and 
originally from trunk https://svn.open-mpi.org/trac/ompi/changeset/23434.  I 
removed the opal_progress_event_users_decrement() from ompi_mpi_init() because 
the ORTE DPM does its own _increment() and _decrement().

However, it seems that there was an unintended consequence of this -- look at 
the annotated Ganglia graph that the user sent (see attached).  In 1.4.2, all 
of the idle time was "user" CPU usage.  In 1.4.3, it's split between user and 
system CPU usage.  The application that he used to test is basically an init / 
finalize test (with some additional MPI middleware).  See:

    http://www.open-mpi.org/community/lists/users/2010/11/14773.php

Can anyone think of why this occurs, and/or if it's a Bad Thing?

If removing this decrement enabled a bunch more system CPU time, that would 
seem to imply that we're calling libevent more frequently than we used to (vs. 
polling the opal event callbacks), and therefore that there might now be an 
unmatched increment somewhere.

Right...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to