george.
On Apr 25, 2008, at 7:52 PM, Ralph Castain wrote:
On 4/25/08 5:38 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:To bounce on last George remark, currently when a job dies without unsubscribing a port with Unpublish(due to poor user programming, failure or abort), ompi-server keeps the reference forever and a new application can therefore not publish under the same name again. So I guess this is a good point to cleanup correctly all published/opened ports, when the application is ended (for whatever reason).That's a good point - in my other note, all I had addressed was closing my local port. We should ensure that the pubsub framework does an unpublish of anything we put out there. I'll have to create a command to do that since pubsub doesn't actually track what it was asked to publish - we'll needsomething that tells both local and global data servers to "unpublish anything that came from me".Another cool feature could be to have mpirun behave as an ompi- server,and publish a suitable URI if requested to do so (if the urifile does not exist yet ?). I know from the source code that mpirun is already including anything needed to offer this feature, exept the ability to provide a suitable URI.Just to be sure I understand, since I think this is doable. Mpirun already does serve as your "ompi-server" for any job it spawns - that is the purposeof the MPI_Info flag "local" instead of "global" when you publish information. You can always publish/lookup against your own mpirun.What you are suggesting here is that we have each mpirun put its local dataserver port info somewhere that another job can find it, either in thealready existing contact_info file, or perhaps in a separate "data serveruri" file?The only reason for concern here is the obvious race condition. Since mpirun only exists during the time a job is running, you could lookup its contact info and attempt to publish/lookup to that mpirun, only to find it doesn'trespond because it either is already dead or on its way out. Hence thenotion of restricting inter-job operations to the system-level ompi- server.If we can think of a way to deal with the race condition, I'm certainly willing to publish the contact info. I'm just concerned that you may find yourself "hung" if that mpirun goes away unexpectedly - say right in themiddle of a publish/lookup operation. RalphAurelien Le 25 avr. 08 à 19:19, George Bosilca a écrit :Ralph, Thanks for your concern regarding the level of compliance of our implementation of the MPI standard. I don't know who were the MPI gurus you talked with about this issue, but I can tell that for once the MPI standard is pretty clear about this. As stated by Aurelien in his last email, using the plural in several sentences, strongly suggest that the status of port should not be implicitly modified by MPI_Comm_accept or MPI_Comm_connect. Moreover, in the beginning of the chapter in the MPI standard, it is specified that comm/accept work exactly as in TCP. In other words, once the port is opened it stay open until the user explicitly close it. However, not all corner cases are addressed by the MPI standard. What happens on MPI_Finalize ... it's a good question. Personally, I think we should stick with the TCP similarities. The port should be not only closed by unpublished. This will solve all issues with people trying to lookup a port once the originator is gone. george. On Apr 25, 2008, at 5:25 PM, Ralph Castain wrote:As I said, it makes no difference to me. I just want to ensure that everyone agrees on the interpretation of the MPI standard. We have had these discussion in the past, with differing views. My guess here is that the port was left open mostly because the person who wrote the C-binding forgot to close it. ;-) So, you MPI folks: do we allow multiple connections against a single port, and leave the port open until explicitly closed? If so, then do we generate an error if someone calls MPI_Finalize without first closing the port? Or do we automatically close any open ports when finalize is called? Or do we automatically close the port after the connect/accept is completed? Thanks Ralph On 4/25/08 3:13 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:Actually, the port was still left open forever before the change. Thebug damaged the port string, and it was not usable anymore, not only in subsequent Comm_accept, but also in Close_port or Unpublish_name.To more specifically answer to your open port concern, if the user does not want to have an open port anymore, he should specifically call MPI_Close_port and not rely on MPI_Comm_accept to close it. Actually the standard suggests the exact contrary: section 5.4.2states "it must call MPI_Open_port to establish a port [...] it mustcall MPI_Comm_accept to accept connections from clients". Because there is multiple clients AND multiple connections in that sentence, I assume the port can be used in multiple accepts. Aurelien Le 25 avr. 08 à 16:53, Ralph Castain a écrit :Hmmm...just to clarify, this wasn't a "bug". It was my understanding per theMPI folks that a separate, unique port had to be created for everyinvocation of Comm_accept. They didn't want a port hanging around open, and their plan was to close the port immediately after the connection was established. So dpm_orte was written to that specification. When I reorganized the code, I left the logic as it had been written - which was actually done by the MPI side of the house, not me. I have no problem with making the change. However, since the specificationwas created on the MPI side, I just want to make sure that the MPIfolks all realize this has now been changed. Obviously, if this change in spec is adopted, someone needs to make sure that the C and Fortran bindings - do not- close that port any more! Ralph On 4/25/08 2:41 PM, "boute...@osl.iu.edu" <boute...@osl.iu.edu> wrote:Author: bouteill Date: 2008-04-25 16:41:44 EDT (Fri, 25 Apr 2008) New Revision: 18303 URL: https://svn.open-mpi.org/trac/ompi/changeset/18303 Log: Fix a bug that rpevented to use the same port (as returned by Open_port) for several Comm_accept) Text files modified: trunk/ompi/mca/dpm/orte/dpm_orte.c | 19 ++++++++++--------- 1 files changed, 10 insertions(+), 9 deletions(-) Modified: trunk/ompi/mca/dpm/orte/dpm_orte.c = = = = = = = = = = = == = ================================================================--- trunk/ompi/mca/dpm/orte/dpm_orte.c (original) +++ trunk/ompi/mca/dpm/orte/dpm_orte.c 2008-04-25 16:41:44 EDT (Fri, 25 Apr 2008) @@ -848,8 +848,14 @@ { char *tmp_string, *ptr; + /* copy the RML uri so we can return a malloc'd value + * that can later be free'd + */ + tmp_string = strdup(port_name); + /* find the ':' demarking the RML tag we added to the end */ - if (NULL == (ptr = strrchr(port_name, ':'))) { + if (NULL == (ptr = strrchr(tmp_string, ':'))) { + free(tmp_string); return NULL; } @@ -863,15 +869,10 @@ /* see if the length of the RML uri is too long - if so, * truncate it */ - if (strlen(port_name) > MPI_MAX_PORT_NAME) { - port_name[MPI_MAX_PORT_NAME] = '\0'; + if (strlen(tmp_string) > MPI_MAX_PORT_NAME) { + tmp_string[MPI_MAX_PORT_NAME] = '\0'; } - - /* copy the RML uri so we can return a malloc'd value - * that can later be free'd - */ - tmp_string = strdup(port_name); - + return tmp_string; } _______________________________________________ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
smime.p7s
Description: S/MIME cryptographic signature