This sounds like excellent progress! Jeff and others know much more about MTT than I do, so I'll leave that question to them.
You have two approaches to the mmap issue. Easiest for now would be to simply disable the shared memory component - you can either turn it off at run-time with -mca btl ^sm, or you can direct that it not even be built with -enable-mca-no-build=btl-sm when configuring OMPI. I would think your TCP comm would then allow the two procs sharing a host to communicate. Can you give that a try? I'd be happy to begin reviewing the changes, and can help integrate them back into the OMPI trunk, when you feel ready. On Aug 12, 2010, at 9:35 PM, 张晶 wrote: > Hi Ralph,Jeff and all > > It is a good news that I can almost run the openmpi on the vxworks > ,but there are also still some bugs.The final test which has passed is: > Rank 0 process calls mpi_send running on the host 0,rank 1 process > calls mpi_recv running on the host 1. It works well .For the absence of the > mmap in the vxworks ,which is used in the btl sm component , it still fails > running two processes in the same host. > The difference between the vxworks and unix is the real trouble .For > example pipe(),fork(),exec(),socketpair(),fcntl() ,sshd and so no are not > implemented in the vxworks .Replacing these lost with the correspond > functions is the key work of the migration.After having a clear understanding > of the function of rsh component ,I write a simple daemon and client to > launch the orted for the calling of the rlogin() in the user space of the > vxworks complain. > I think there are still many test needed to be launching .Maybe I'd > better to look into MTT. > > 在 2010年7月8日 上午9:54,张晶 <iam.chi...@gmail.com>写道: > Thank you ,Squyres , it is really useful ! > > 在 2010年7月7日 下午7:22,Jeff Squyres <jsquy...@cisco.com>写道: > On Jul 6, 2010, at 10:48 PM, 张晶 wrote: > > > 1.If I write a rlogin component , > > Is the command line of rlogin that much different than that of rsh/ssh? For > example, can you just s/rsh/rlogin/ on the overall command line and have it > just work? > > If so, I suspect that tweaking the rsh plm might be far simpler than having > your own component. > > > can I just login in the node in the cluster and launch the process . If it > > is ,what the role the odls plays ?? > > ODLS = ORTE Daemon Local launch Subsystem. > > PLM = Process Lifecycle Management. > > Meaning: the PLM is used to launch orteds (more on this below) across > multiple nodes. The ODLS is used to launch processes locally from the orted > (e.g., via POSIX fork/exec). > > > 2.what is orted? Should the orted exists in every node and functions as a > > node process launch proxy ? > > Yes. The orted = ORTE daemon. It is almost always the first thing launched > on each node and acts as a proxy for launching, killing, and monitoring the > user's applications on each node. It also does other control kinds of > things, like relay stdout/stderr back up to the HNP (more below), etc. > > > 3,what is hnp ? Is every job has only one hnp ,and when I use mpirun , the > > mpirun process is hnp ?? > > HNP = head node process, meaning mpirun (or actually, orterun -- mpirun is a > symlink to orterun). The HNP functions as an orted as well, so it can use > the ODLS to launch processes locally, etc. > > Ralph can provide more detail on all of the above, but these are the basics. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > 张晶 > > > > -- > 张晶 > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel