Having gone around in circles on hostfile-related issues for over five years
now, I honestly have little motivation to re-open the entire discussion
again. It doesn't seem to be that daunting a requirement for those who are
using it, so I'm inclined to just leave well enough alone.
:-)


On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh <eugene....@sun.com> wrote:

>  Ralph Castain wrote:
>
> The two files have a slightly different format
>
> Agreed.
>
> and completely different meaning.
>
> Somewhat agreed.  They're both related to mapping processes onto a cluster.
>
> The hostfile specifies how many slots are on a node. The rankfile specifies
> a rank and what node/slot it is to be mapped onto.
>
> Agreed.
>
> Rankfiles can use relative node indexing and refer to nodes received from a
> resource manager - i.e., without any hostfile.
>
> This is the main part I'm concerned about.  E.g.,
>
> % cat rankfile
> rank 0=node0 slot=0
> rank 1=node1 slot=0
> % mpirun -np 2 -rf rankfile ./a.out
> --------------------------------------------------------------------------
> Rankfile claimed host node1 that was not allocated or oversubscribed it's
> slots:
>
> --------------------------------------------------------------------------
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> rmaps_rank_file.c at line 107
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/rmaps_base_map_job.c at line 86
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/plm_base_launch_support.c at line 86
> [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file
> plm_rsh_module.c at line 1016
> % mpirun -np 2 -host node0,node1 -rf rankfile ./a.out
> 0 on node0
> 1 on node1
> done
>
> It seems to me that the rankfile has sufficient information to express what
> I want it to do.  But mpirun won't accept this.  To fix this, I have to,
> e.g., supply/maintain/specify redundant information in a hostfile or host
> list.
>
> So the files are intentionally quite different. Trying to combine them
> would be rather ugly.
>
> Right.  And my issue is that I'm forced to use both when I only want
> rankfile functionality.
>
>  On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh <eugene....@sun.com> wrote:
>
>> In order to use "mpirun --rankfile", I also need to specify
>> hosts/hostlist.  But that information is redundant with what I provide in
>> the rankfile.  So, from a user's point of view, this strikes me as broken.
>>  Yes?  Should I file a ticket, or am I missing something here about this
>> functionality?
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to