Ralph,

I've not been very successful at using ompi-server. I tried this :

xterm1$ ompi-server --debug-devel -d --report-uri test
[grosse-pomme.local:01097] proc_info: hnp_uri NULL
        daemon uri NULL
[grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and running!


xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test
Port name:
2285895681.0;tcp://192.168.0.101:50065;tcp://192.168.0.150:50065:300

xterm3$ mpirun -ompi-server test  -np 1 simple_connect
--------------------------------------------------------------------------
Process rank 0 attempted to lookup from a global ompi_server that
could not be contacted. This is typically caused by either not
specifying the contact info for the server, or by the server not
currently executing. If you did specify the contact info for a
server, please check to see that the server is running and start
it again (or have your sys admin start it) if it isn't.

--------------------------------------------------------------------------
[grosse-pomme.local:01122] *** An error occurred in MPI_Lookup_name
[grosse-pomme.local:01122] *** on communicator MPI_COMM_WORLD
[grosse-pomme.local:01122] *** MPI_ERR_NAME: invalid name argument
[grosse-pomme.local:01122] *** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------



The server code Open_port, and then PublishName. Looks like the LookupName function cannot reach the ompi-server. The ompi-server in debug mode does not show any output when a new event occurs (like when the server is launched). Is there something wrong in the way I use it ?

Aurelien

Le 3 avr. 08 à 17:21, Ralph Castain a écrit :
Take a gander at ompi/tools/ompi-server - I believe I put a man page in
there. You might just try "man ompi-server" and see if it shows up.

Holler if you have a question - not sure I documented it very thoroughly at
the time.


On 4/3/08 3:10 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu> wrote:

Ralph,


I am using trunk. Is there a documentation for ompi-server ? Sounds
exactly like what I need to fix point 1.

Aurelien

Le 3 avr. 08 à 17:06, Ralph Castain a écrit :
I guess I'll have to ask the basic question: what version are you
using?

If you are talking about the trunk, there no longer is a "universe"
concept
anywhere in the code. Two mpiruns can connect/accept to each other
as long
as they can make contact. To facilitate that, we created an "ompi-
server"
tool that is supposed to be run by the sys-admin (or a user, doesn't
matter
which) on the head node - there are various ways to tell mpirun how to
contact the server, or it can self-discover it.

I have tested publish/lookup pretty thoroughly and it seems to work. I
haven't spent much time testing connect/accept except via
comm_spawn, which
seems to be working. Since that uses the same mechanism, I would have
expected connect/accept to work as well.

If you are talking about 1.2.x, then the story is totally different.

Ralph



On 4/3/08 2:29 PM, "Aurélien Bouteiller" <boute...@eecs.utk.edu>
wrote:

Hi everyone,

I'm trying to figure out how complete is the implementation of
Comm_connect/Accept. I found two problematic cases.

1) Two different programs are started in two different mpirun. One
makes accept, the second one use connect. I would not expect
MPI_Publish_name/Lookup_name to work because they do not share the
HNP. Still I would expect to be able to connect by copying (with
printf-scanf) the port_name string generated by Open_port; especially
considering that in Open MPI, the port_name is a string containing
the
tcp address and port of the rank 0 in the server communicator.
However, doing so results in "no route to host" and the connecting
application aborts. Is the problem related to an explicit check of
the
universes on the accept HNP ? Do I expect too much from the MPI
standard ? Is it because my two applications does not share the same
universe ? Should we (re) add the ability to use the same universe
for
several mpirun ?

2) Second issue is when the program setup a port, and then accept
multiple clients on this port. Everything works fine for the first
client, and then accept stalls forever when waiting for the second
one. My understanding of the standard is that it should work: 5.4.2
states "it must call MPI_Open_port to establish a port [...] it must
call MPI_Comm_accept to accept connections from clients". I
understand
that for one MPI_Open_port I should be able to manage several MPI
clients. Am I understanding correctly the standard here and should we
fix this ?

Here is a copy of the non-working code for reference.

/*
* Copyright (c) 2004-2007 The Trustees of the University of
Tennessee.
*                         All rights reserved.
* $COPYRIGHT$
*
* Additional copyrights may follow
*
* $HEADER$
*/
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
   char port[MPI_MAX_PORT_NAME];
   int rank;
   int np;


   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &np);

   if(rank)
   {
       MPI_Comm comm;
       /* client */
       MPI_Recv(port, MPI_MAX_PORT_NAME, MPI_CHAR, 0, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
       printf("Read port: %s\n", port);
       MPI_Comm_connect(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
&comm);

       MPI_Send(&rank, 1, MPI_INT, 0, 1, comm);
       MPI_Comm_disconnect(&comm);
   }
   else
   {
       int nc = np - 1;
       MPI_Comm *comm_nodes = (MPI_Comm *) calloc(nc,
sizeof(MPI_Comm));
       MPI_Request *reqs = (MPI_Request *) calloc(nc,
sizeof(MPI_Request));
       int *event = (int *) calloc(nc, sizeof(int));
       int i;

       MPI_Open_port(MPI_INFO_NULL, port);
/* MPI_Publish_name("test_service_el", MPI_INFO_NULL, port);*/
       printf("Port name: %s\n", port);
       for(i = 1; i < np; i++)
           MPI_Send(port, MPI_MAX_PORT_NAME, MPI_CHAR, i, 0,
MPI_COMM_WORLD);

       for(i = 0; i < nc; i++)
       {
           MPI_Comm_accept(port, MPI_INFO_NULL, 0, MPI_COMM_SELF,
&comm_nodes[i]);
           printf("Accept %d\n", i);
           MPI_Irecv(&event[i], 1, MPI_INT, 0, 1, comm_nodes[i],
&reqs[i]);
           printf("IRecv %d\n", i);
       }
       MPI_Close_port(port);
       MPI_Waitall(nc, reqs, MPI_STATUSES_IGNORE);
       for(i = 0; i < nc; i++)
       {
           printf("event[%d] = %d\n", i, event[i]);
           MPI_Comm_disconnect(&comm_nodes[i]);
           printf("Disconnect %d\n", i);
       }
   }

   MPI_Finalize();
   return EXIT_SUCCESS;
}




--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321





_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to