Yes, you're right -- we should have reviewed this code 2 weeks ago when you asked. Sorry about that. :-\

Per adding lots of affinity code in ompi_mpi_init.c: perhaps those code belongs down in the paffinity (or rmaps?) base. It doesn't have to become part of any specific paffinity component (because it can be used with any paffinity component) This makes it callable by anyone (including orte) and keeps the abstraction barriers clean.



On Mar 19, 2008, at 5:36 AM, Lenny Verkhovsky wrote:

1. I see a getenv("slot_list") in the MPI side of the code; it
looks
like $slot_list is set by the odls for the MPI process.  Why isn't
it
an MCA parameter?  That's what all other values passed by the orted
to
the MPI process appear to be.

"slot_list" consist of socket:core pair for the rank to be bind to. This info changes according to rankfile and different for each node and rank,
therefore it cannot be passed via mca parameter.

I don't follow the logic here. MCA parameters can certainly be unique per MPI process...

Remember that MCA parameters can be environment variables. The advantage of using MCA params as env variables is that we enforce a common prefix to ensure that we don't collide with other environment variables. There's functions to get the environment variable names of MCA parameters, for example, so that you can setenv them to pass them to another process (e.g., in the odls). Then you use the normal MCA parameter lookup functions to retrieve them in the target/receiver process.

2. I see that ompi_mpi_params.c is now registering 2 rmaps-level
MCA
parameters.  Why?  Shouldn't these be in ORTE somewhere?

If you mean paffinity_alone and rank_file_debug, then
1. paffinity_alone was there before.
2. After getting some answers from Ralph about orte_debug in
ompi_mpi_init I intend to introduce ompi_debug mca parameter that will
be used in this library and rank_file_debug will be removed.

rmaps_rank_file_path and rmaps_rank_file_debug. These have no place being registered in the OMPI layer.

It looks like rank_file_path is only registered in ompi_mpi_init.c as an error check. Why isn't this done in the rmaps rankfile component itself? This would execute in mpirun and avoid launching at all if an error is detected (vs. detecting the error in each MPI process and aborting each one).

A few more notes:

3. Most of the files in orte/mca/rmaps/rankfile do not obey the
prefix
rule.  I think that they should be renamed.

Rank_file component was copied from round_robin, I thought it would be
strange if it would look differently.

Blah -- it looks like round robin's files don't adhere to the prefix rule. In fairness, those files *may* be so old to predate the prefix rule...?

Regardless, I think the rankfile files should be named in accordance with the rest of the code base and adhere to the prefix rule. round robin should probably be fixed as well.

4. A quick look through rankfile_lex.l seems to show that there are
global variables that are not protected by the prefix rule (or
static).  Ditto in rmaps_rf.c.  These should be fixed.

What do you mean?

From lex.l:

int rank_file_line=1;
rank_file_value_t rank_file_value;
bool rank_file_done = false;

These are neither static nor do they adhere to the prefix rule (obviously, if a symbol is static, it doesn't have to adhere to the prefix rule). Ditto for "rank_file_path" and "rankmap" in rmaps_rf.c. There may be others; that's all I looked through (e.g., I didn't check other files or check function symbols).

--
Jeff Squyres
Cisco Systems

Reply via email to