Re: [OMPI devel] -display-map

Ralph Castain Mon, 20 Oct 2008 16:49:25 -0400

Hmmm...just to be sure we are all clear on this. The reason weproposed to use mpirun is that "hostfile" has no meaning outside ofmpirun. That's why ompi_info can't do anything in this regard.

We have no idea what hostfile the user may specify until we actuallyget the mpirun cmd line. They may have specified a default-hostfile,but they could also specify hostfiles for the individual app_contexts.These may or may not include the node upon which mpirun is executing.

So the only way to provide you with a separate command to get ahostfile<->nodename mapping would require you to provide us with thedefault-hostifle and/or hostfile cmd line options just as if you wereissuing the mpirun cmd. We just wouldn't launch - but it would be theexact equivalent of doing "mpirun --do-not-launch".

Am I missing something? If so, please do correct me - I would be happyto provide a tool if that would make it easier. Just not sure whatthat tool would do.


Thanks
Ralph


On Oct 19, 2008, at 1:59 PM, Greg Watson wrote:

Ralph,
It seems a little strange to be using mpirun for this, but barringproviding a separate command, or using ompi_info, I think this wouldsolve our problem.
Thanks,

Greg

On Oct 17, 2008, at 10:46 AM, Ralph Castain wrote:
Sorry for delay - had to ponder this one for awhile.
Jeff and I agree that adding something to ompi_info would not be agood idea. Ompi_info has no knowledge or understanding ofhostfiles, and adding that capability to it would be a majordistortion of its intended use.
However, we think we can offer an alternative that might bettersolve the problem. Remember, we now treat hostfiles in a verydifferent manner than before - see the wiki page for a completedescription, or "man orte_hosts".
So the problem is that, to provide you with what you want, we needto "dump" the information from whatever default-hostfile wasprovided, and, if no default-hostfile was provided, then theinformation from each hostfile that was provided with an app_context.
The best way we could think of to do this is to add another mpiruncmd line option --dump-hostfiles that would output the line-by-linename from the hostfile plus the name we resolved it to. Of course,--xml would cause it to be in xml format.
Would that meet your needs?

Ralph


On Oct 15, 2008, at 3:12 PM, Greg Watson wrote:
Hi Ralph,
We've been discussing this back and forth a bit internally anddon't really see an easy solution. Our problem is that Eclipse isnot running on the head node, so gethostbyname will notnecessarily resolve to the same address. For example, the hostfilemight refer to the head node by an internal network address thatis not visible to the outside world. Since gethostname also looksin /etc/hosts, it may resolve locally but not on a remote system.The only think I can think of would be, rather than us reading thehostfile directly as we do now, to provide an option to ompi_infothat would dump the hostfile using the same rules that you applywhen you're using the hostfile. Would that be feasible?
Greg

On Sep 22, 2008, at 4:25 PM, Ralph Castain wrote:
Sorry for delay - was on vacation and am now trying to work myway back to the surface.
I'm not sure I can fix this one for two reasons:
1. In general, OMPI doesn't really care what name is used for thenode. However, the problem is that it needs to be consistent. Inthis case, ORTE has already used the name returned by gethostnameto create its session directory structure long before mpirunreads a hostfile. This is why we retain the value fromgethostname instead of allowing it to be overwritten by the namein whatever allocation we are given. Using the name in hostfilewould require that I either find some way to remember any priorname, or that I tear down and rebuild the session directory tree- neither seems attractive nor simple (e.g., what happens whenthe user provides multiple entries in the hostfile for the node,each with a different IP address based on another interface inthat node? Sounds crazy, but we have already seen it done - whichone do I use?).
2. We don't actually store the hostfile info anywhere - we justuse it and forget it. For us to add an XML attribute containingany hostfile-related info would therefore require us to re-readthe hostfile. I could have it do that -only- in the case of "XMLoutput required", but it seems rather ugly.
An alternative might be for you to simply do a "gethostbyname"lookup of the IP address or hostname to see if it matches insteadof just doing a strcmp. This is what we have to do internally aswe frequently have problems with FQDN vs. non-FQDN vs. IPaddresses etc. If the local OS hasn't cached the IP address forthe node in question it can take a little time to DNS resolve it,but otherwise works fine.
I can point you to the code in OPAL that we use - I would thinksomething similar would be easy to implement in your code andwould readily solve the problem.
Ralph

On Sep 19, 2008, at 7:18 AM, Greg Watson wrote:
Ralph,
The problem we're seeing is just with the head node. If Ispecify a particular IP address for the head node in thehostfile, it gets changed to the FQDN when displayed in the map.This is a problem for us as we need to be able to match the two,and since we're not necessarily running on the head node, wecan't always do the same resolution you're doing.
Would it be possible to use the same address that is specifiedin the hostfile, or alternatively provide an XML attribute thatcontains this information?
Thanks,

Greg

On Sep 11, 2008, at 9:06 AM, Ralph Castain wrote:
Not in that regard, depending upon what you mean by "recently".The only changes I am aware of wrt nodes consisted of somechanges to the order in which we use the nodes when specifiedby hostfile or -host, and a little #if protectionism needed byBrian for the Cray port.
Are you seeing this for every node? Reason I ask: I can'toffhand think of anything in the code base that would replace ahost name with the FQDN because we don't get that info forremote nodes. The only exception is the head node (where mpirunsits) - in that lone case, we default to the name returned tous by gethostname(). We do that because the head node isfrequently accessible on a more global basis than the computenodes - thus, the FQDN is required to ensure that there is noaddress confusion on the network.
If the user refers to compute nodes in a hostfile or -host (orin an allocation from a resource manager) by non-FQDN, we justassume they know what they are doing and the name willcorrectly resolve to a unique address.
On Sep 10, 2008, at 9:45 AM, Greg Watson wrote:
Hi,
Has there been a change in the behavior of the -display-mapoption has changed recently in the 1.3 branch. We're nowseeing the host name as a fully resolved DN rather than theentry that was specified in the hostfile. Is there anyparticular reason for this? If so, would it be possible to addthe hostfile entry to the output since we need to be able tomatch the two?
Thanks,

Greg
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] -display-map

Reply via email to