Ah - interesting scenario!

Definitely a "bug" in the code, then. What it looks like, though, is that
the jdata->num_procs is wrong. There shouldn't be any way that the num_procs
in the node array is different than jdata->num_procs.

My guess is that the rank_file mapper isn't correctly maintaining the
bookkeeping when we map the procs beyond those in the rankfile. I'll dig
into it - have to fix something for Lenny anyway.

Meantime, this change looks fine regardless as it (a) is better code and (b)
protects us against such errors.

Thanks for catching it!
Ralph


On Wed, Jul 15, 2009 at 2:30 PM, George Bosilca <bosi...@eecs.utk.edu>wrote:

> I think I found a better solution (in r21688). Here is what I was trying to
> do.
>
> I have a more or less homogeneous cluster. In fact all processors are
> identical, except that some are quad core and some dual core. Of course I
> care how my processes are mapped on the quad cores, but not really on the
> dual cores.
>
> My approach was to use the following configuration files.
>
> In /home/bosilca/.openmpi/mca-params.conf I have:
>
> orte_default_hostfile=/home/bosilca/.openmpi/machinefile
> rmaps_rank_file_path = /home/bosilca/.openmpi/rankfile
> rmaps_rank_file_priority = 100
>
> In /home/bosilca/.openmpi/machinefile I have the full description of the
> cluster. As an example:
> node01 slots=4
> node02 slots=4
> node03 slots=2
> node04 slots=2
>
> And in the /home/bosilca/.openmpi/rankfile file I have:
> rank 0=+n0 slot=0
> rank 1=+n0 slot=1
> rank 2=+n1 slot=0
> rank 3=+n1 slot=1
>
> As long as I spawn jobs with less than 4 processes everything worked fine.
> But when I used more than 4 processes, orterun segfaulted. After debugging I
> found that the nodes, lrank and nrank arrays were allocated based on the
> jdata->num_procs, but then filled based on the total number of processes in
> the jdata->nodes array. As it appears that the jdata->num_procs is somehow
> modified based on the number of entries in the rankfile, we end-up writing
> outside the allocation and then segfault. Now with the latest patch, we can
> cope with such a scenario by only packing the known information (and thus
> not writing outside the allocated arrays).
>
> This might not be the best approach, but it is doing what I'm looking for
> ...
>
>  george.
>
>
> On Jul 15, 2009, at 15:50 , Ralph Castain wrote:
>
>  The routed comm system relies on each daemon having complete information
>> as to where every process is located, so the expectation was that only full
>> maps would ever be sent. Thus, the nidmap code is setup to always send a
>> full map.
>>
>> I don't know how to even generate a "partial" map. I assume you are doing
>> something offline? Is this to update changed info? If so, you'll also have
>> to do something to update the daemon's maps or the comm system will break
>> down.
>>
>> Ralph
>>
>> On Wed, Jul 15, 2009 at 1:40 PM, George Bosilca <bosi...@eecs.utk.edu>
>> wrote:
>> I have a question regarding the mapping. How can I declare a partial
>> mapping ? In fact I only care about how some of the processes are mapped on
>> some specific nodes. Right now if the rmaps doesn't contain information
>> about all nodes, we give up (before this patch we segfaulted).
>>
>> Does it means we always have to declare the whole mapping or it's just
>> that we overlooked this strange case?
>>
>>  george.
>>
>> Begin forwarded message:
>>
>>
>> Author: bosilca
>> Date: 2009-07-15 15:36:53 EDT (Wed, 15 Jul 2009)
>> New Revision: 21686
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/21686
>>
>> Log:
>> Reorder the nidmap encoding function. Add a check to make sure we don't
>> write
>> outside the boundaries of the allocated array.
>>
>> However, the problem is still there. If we have rmaps file containing only
>> partial information the num_procs get set to the wrong value (the number
>> of
>> hosts in the rmaps file instead of the number of processes requested on
>> the
>> command line).
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to