This sounds like excellent progress! Jeff and others know much more about MTT 
than I do, so I'll leave that question to them.

You have two approaches to the mmap issue. Easiest for now would be to simply 
disable the shared memory component - you can either turn it off at run-time 
with -mca btl ^sm, or you can direct that it not even be built with 
-enable-mca-no-build=btl-sm when configuring OMPI.

I would think your TCP comm would then allow the two procs sharing a host to 
communicate. Can you give that a try?

I'd be happy to begin reviewing the changes, and can help integrate them back 
into the OMPI trunk, when you feel ready.


On Aug 12, 2010, at 9:35 PM, 张晶 wrote:

> Hi Ralph,Jeff and all
> 
>        It is a good news that I can almost run the openmpi on the vxworks 
> ,but there are also still some bugs.The final test which has passed is:
>       Rank 0 process  calls mpi_send running on the host 0,rank 1 process 
> calls mpi_recv running on the host 1. It works well .For the absence of  the 
> mmap in the vxworks  ,which is used in the btl sm component , it still fails 
> running two processes in the same host.
>       The difference between the vxworks and unix is the real trouble .For 
> example pipe(),fork(),exec(),socketpair(),fcntl() ,sshd  and so no are not 
> implemented in the vxworks .Replacing these lost with the correspond 
> functions is the key work of the migration.After having a clear understanding 
> of the function of rsh component ,I write a  simple daemon and client to 
> launch the orted for the calling of the rlogin() in the user space of the 
> vxworks complain. 
>       I think there are still many test needed to be launching .Maybe I'd 
> better to look into  MTT.
> 
> 在 2010年7月8日 上午9:54,张晶 <iam.chi...@gmail.com>写道:
> Thank you ,Squyres , it is really useful !
> 
> 在 2010年7月7日 下午7:22,Jeff Squyres <jsquy...@cisco.com>写道:
> On Jul 6, 2010, at 10:48 PM, 张晶 wrote:
> 
> > 1.If I write a rlogin component ,
> 
> Is the command line of rlogin that much different than that of rsh/ssh?  For 
> example, can you just s/rsh/rlogin/ on the overall command line and have it 
> just work?
> 
> If so, I suspect that tweaking the rsh plm might be far simpler than having 
> your own component.
> 
> > can I just login in the node in the cluster and launch the process .  If it 
> > is ,what the role the odls plays ??
> 
> ODLS = ORTE Daemon Local launch Subsystem.
> 
> PLM = Process Lifecycle Management.
> 
> Meaning: the PLM is used to launch orteds (more on this below) across 
> multiple nodes.  The ODLS is used to launch processes locally from the orted 
> (e.g., via POSIX fork/exec).
> 
> > 2.what is orted? Should the orted exists in every node and functions as a 
> > node process launch proxy ?
> 
> Yes.  The orted = ORTE daemon.  It is almost always the first thing launched 
> on each node and acts as a proxy for launching, killing, and monitoring the 
> user's applications on each node.  It also does other control kinds of 
> things, like relay stdout/stderr back up to the HNP (more below), etc.
> 
> > 3,what is hnp ? Is every job has only one hnp ,and when I use mpirun , the 
> > mpirun process is hnp ??
> 
> HNP = head node process, meaning mpirun (or actually, orterun -- mpirun is a 
> symlink to orterun).  The HNP functions as an orted as well, so it can use 
> the ODLS to launch processes locally, etc.
> 
> Ralph can provide more detail on all of the above, but these are the basics.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> -- 
> 张晶
> 
> 
> 
> -- 
> 张晶
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to