On Apr 29, 2009, at 1:38 PM, Jerry Ye wrote:
I’m currently working in an environment where I cannot use SSH to launch child processes. Instead, the process with rank 0 skips the ssh_child function in plm_rsh_module.c and the child processes are all started at the same time on different machines. Coordination is done with static jobids and ports. I have sucessfully modified the code to get the hello_c example working.
Excellent. What mechanism are you using to start your jobs? Would it be easier to fork the rsh plm into your own plugin? Is this code you can share with the community?
However, I’m having problems with inter-process communication when using MPI_Bcast. Is there something else that I’m obviously missing?
The PLM just starts up jobs -- other plugins are used for MPI communications. E.g., the TCP BTL is probably what you're using for MPI communications. Is that where it's failing?
-- Jeff Squyres Cisco Systems