Having gone around in circles on hostfile-related issues for over five years now, I honestly have little motivation to re-open the entire discussion again. It doesn't seem to be that daunting a requirement for those who are using it, so I'm inclined to just leave well enough alone. :-)
On Fri, Jun 19, 2009 at 2:21 PM, Eugene Loh <eugene....@sun.com> wrote: > Ralph Castain wrote: > > The two files have a slightly different format > > Agreed. > > and completely different meaning. > > Somewhat agreed. They're both related to mapping processes onto a cluster. > > The hostfile specifies how many slots are on a node. The rankfile specifies > a rank and what node/slot it is to be mapped onto. > > Agreed. > > Rankfiles can use relative node indexing and refer to nodes received from a > resource manager - i.e., without any hostfile. > > This is the main part I'm concerned about. E.g., > > % cat rankfile > rank 0=node0 slot=0 > rank 1=node1 slot=0 > % mpirun -np 2 -rf rankfile ./a.out > -------------------------------------------------------------------------- > Rankfile claimed host node1 that was not allocated or oversubscribed it's > slots: > > -------------------------------------------------------------------------- > [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file > rmaps_rank_file.c at line 107 > [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file > base/rmaps_base_map_job.c at line 86 > [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file > base/plm_base_launch_support.c at line 86 > [node0:14611] [[61560,0],0] ORTE_ERROR_LOG: Bad parameter in file > plm_rsh_module.c at line 1016 > % mpirun -np 2 -host node0,node1 -rf rankfile ./a.out > 0 on node0 > 1 on node1 > done > > It seems to me that the rankfile has sufficient information to express what > I want it to do. But mpirun won't accept this. To fix this, I have to, > e.g., supply/maintain/specify redundant information in a hostfile or host > list. > > So the files are intentionally quite different. Trying to combine them > would be rather ugly. > > Right. And my issue is that I'm forced to use both when I only want > rankfile functionality. > > On Thu, Jun 18, 2009 at 1:52 PM, Eugene Loh <eugene....@sun.com> wrote: > >> In order to use "mpirun --rankfile", I also need to specify >> hosts/hostlist. But that information is redundant with what I provide in >> the rankfile. So, from a user's point of view, this strikes me as broken. >> Yes? Should I file a ticket, or am I missing something here about this >> functionality? >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >