On Jul 9, 2010, at 3:23 PM, Jerome Soumagne wrote: > Hi Ken, > > I thank you a lot for your reply, I will think about it and do some more > tests. I was only thinking about using MPI threads, but yes as you say if two > threads are scheduled on the same core, that wouldn't be pretty at all. I can > probably do some more tests of that functionality, but I don't expect to have > great results. > > I'm not sure to correctly understand what you say about the spawn. I found a > presentation on the web from Richard Graham saying that the spawn > functionality was implemented as well as it says in this presentation that > you get a full MPI 2 support on the Cray XT. When I said that I had problems > with the MPI_Comm_accept/connect functions, I meant that I actually get > errors when I try to do a "simple" MPI_Open_port, do you know where I can > find in the code whether this function is implemented or not? If it is > implemented, knowing where it is defined would help me to find the origin of > my problem and possibly extend the support of this functionality (if it is > feasible). I would like to be able to link two different jobs together using > these functions, ie. creating a communicator between the jobs.
It is implemented in ompi/mca/dpm/orte. I believe it isn't supported for the reasons Ken described. > > Thanks, > > Jerome > > On 07/09/2010 07:16 PM, Matney Sr, Kenneth D. wrote: >> Hello Jerome, >> >> The first one is simple. portals is not thead-safe on the Cray XT. As, I >> recall, >> only the master thread can post an event. although any thread can receive >> the event. Although, i might have it backwards. It has been a couple of >> years >> since I played with this. >> >> The second one depends on how you use your Cray XT. In our case, the machine >> is used as process-per-core; i.e., not as a collection of SMPs. For >> performance >> reasons, you definitely do not want MPI threads. Also, since it is run >> process-per-core, >> there is nothing to be gained with progress threads. Portals events will >> generate a kernel >> level interrupt. Whether you can run the XT as a cluster of SMPs is another >> question >> entirely. We really have not tried this in the context of OMPI. But, in >> conjunction with >> portals, this might open a "can of worms". For example, any thread can be >> run on any >> core. But the portals ID for a thread will be the NID/PID pair for that >> core. If two threads >> get scheduled to the same core, it would not be pretty. >> >> I could see lots of reasons why spawn might fail. First, it is run on a >> compute node. >> There is no way for a compute node to run a process on another compute node. >> Also, there will be no rank/size initialization forthcoming from ALPS. So, >> even if >> it got past this, it would be running on the same node as its parent. >> -- Ken Matney, Sr. >> Oak Ridge National Laboratory >> >> >> On Jul 9, 2010, at 7:53 AM, Jerome Soumagne wrote: >> >> Hi, >> >> As I said in the previous e-mail, we've recently installed OpenMPI on a Cray >> XT5 machine, and we therefore use the portals and the alps libraries. Thanks >> for providing the configuration script from Jaguar, this was very helpful, >> it had just to be slightly adapted in order to use the latest CNL version >> installed on this machine. >> >> I have some questions though regarding the use of the portals btl and mtl >> components. I noticed that when I compiled OpenMPI with mpi-thread support >> enabled and ran a job, the portals components did not want to initialize due >> to these funny lines: >> >> ./mtl_portals_component.c >> 182 /* we don't run with no stinkin' threads */ >> 183 if (enable_progress_threads || enable_mpi_threads) return NULL; >> >> I'd like to know why are mpi threads disabled since threads are supported on >> the XT5, does the btl/mtl require to have thread-safety implemented or >> something like that or is it because of the portals library itself ? >> >> I would also like to use the MPI_Comm_accept/connect functions, it seems >> that it's not possible to do that using the portals mtl even if the spawn >> seems to be supported, did I do something wrong or is it really not >> supported? >> In this case, is it possible to extend this module to support these >> functions? We could help in doing that. >> >> I'd like also to know, are there any plans for creating a module in order to >> use the DMAPP interface for the Gemini interconnect? >> >> Thanks. >> >> Jerome >> >> >> -- >> Jérôme Soumagne >> Scientific Computing Research Group >> CSCS, Swiss National Supercomputing Centre >> Galleria 2, Via Cantonale | Tel: +41 (0)91 610 8258 >> CH-6928 Manno, Switzerland | Fax: +41 (0)91 610 8282 >> >> >> >> <ATT00001..txt> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel