Been wracking my brain on this, and I can't find any way to do this cleanly without invoking some kind of extension/modification to the MPI-RTE interface.
The problem is that we are now executing an "in-band" modex operation. This is fine, but the modex operation (no matter how it is executed) is an RTE-dependent operation. Our current ompi_rte_modex function automatically performs it out-of-band, so we don't want to use it here. However, we currently lack any interface for directly obtaining endpoint info and/or for defining/setting locality. There are several ways we could resolve the endpoint problem: * define flags as I mentioned previously and modify the opal_db APIs to indicate "we want only non-RTE data" * set a convention that all OMPI-level data begin with a known substring like "ompi." - we could then simply call "fetch" with an "ompi.*" wildcard to retrieve all MPI-related data * modify the ompi_modex_* routines to insert "ompi." at the beginning of all keys - this would require an asprintf call, which means a malloc * add new functions "ompi_rte_get_endpoint_info" and "ompi_rte_set_endpoint_info", and let the RTEs figure out how to get/set the right data The locality issue is a little tougher. I can't think of any RTE-agnostic method for setting locality. Unless someone else can, the only option I can propose is to add a new MPI-RTE interface "ompi_rte_set_locality(proc)". Thoughts? Ralph On Sep 18, 2013, at 10:18 AM, Ralph Castain <r...@open-mpi.org> wrote: > Actually, we wouldn't have to modify the interface - just have to define a > DB_RTE flag and OR it to the DB_INTERNAL/DB_EXTERNAL one. We'd need to modify > the "fetch" routines to pass the flag into them so we fetched the right > things, but that's a simple change. > > On Sep 18, 2013, at 10:12 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> I struggled with that myself when doing my earlier patch - part of the >> reason why I added the dpm API. >> >> I don't know how to update the locality without referencing RTE-specific >> keys, so maybe the best thing would be to provide some kind of hook into the >> db that says we want all the non-RTE keys? Would be simple to add that >> capability, though we'd have to modify the interface so we specify "RTE key" >> when doing the initial store. >> >> The "internal" flag is used to avoid re-sending data to the system under >> PMI. We "store" our data as "external" in the PMI components so the data >> gets pushed out, then fetch using PMI and store "internal" to put it in our >> internal hash. So "internal" doesn't mean "non-RTE". >> >> >> On Sep 18, 2013, at 10:02 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >>> I hit send too early. >>> >>> Now that we move the entire "local" modex is there any way to trim it down >>> or to replace the entries that are not correct anymore? Like the locality? >>> >>> George. >>> >>> On Sep 18, 2013, at 18:53 , George Bosilca <bosi...@icl.utk.edu> wrote: >>> >>>> Regarding your comment on the bug trac, I noticed there is a DB_INTERNAL >>>> flag. While I see how to set I could not figure out any way to get it back. >>>> >>>> With the required modification of the DB API can't we take advantage of it? >>>> >>>> George. >>>> >>>> >>>> On Sep 18, 2013, at 18:52 , Ralph Castain <r...@open-mpi.org> wrote: >>>> >>>>> Thanks George - much appreciated >>>>> >>>>> On Sep 18, 2013, at 9:49 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>>>> >>>>>> The test case was broken. I just pushed a fix. >>>>>> >>>>>> George. >>>>>> >>>>>> On Sep 18, 2013, at 16:49 , Ralph Castain <r...@open-mpi.org> wrote: >>>>>> >>>>>>> Hangs with any np > 1 >>>>>>> >>>>>>> However, I'm not sure if that's an issue with the test vs the >>>>>>> underlying implementation >>>>>>> >>>>>>> On Sep 18, 2013, at 7:40 AM, "Jeff Squyres (jsquyres)" >>>>>>> <jsquy...@cisco.com> wrote: >>>>>>> >>>>>>>> Does it hang when you run with -np 4? >>>>>>>> >>>>>>>> Sent from my phone. No type good. >>>>>>>> >>>>>>>> On Sep 18, 2013, at 4:10 PM, "Ralph Castain" <r...@open-mpi.org> wrote: >>>>>>>> >>>>>>>>> Strange - it works fine for me on my Mac. However, I see one >>>>>>>>> difference - I only run it with np=1 >>>>>>>>> >>>>>>>>> On Sep 18, 2013, at 2:22 AM, Jeff Squyres (jsquyres) >>>>>>>>> <jsquy...@cisco.com> wrote: >>>>>>>>> >>>>>>>>>> On Sep 18, 2013, at 9:33 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> 1. sm doesn't work between spawned processes. So you must have >>>>>>>>>>> another network enabled. >>>>>>>>>> >>>>>>>>>> I know :-). I have tcp available as well (OMPI will abort if you >>>>>>>>>> only run with sm,self because the comm_spawn will fail with >>>>>>>>>> unreachable errors -- I just tested/proved this to myself). >>>>>>>>>> >>>>>>>>>>> 2. Don't use the test case attached to my email, I left an xterm >>>>>>>>>>> based spawn and the debugging. It can't work without xterm support. >>>>>>>>>>> Instead try using the test case from the trunk, the one committed >>>>>>>>>>> by Ralph. >>>>>>>>>> >>>>>>>>>> I didn't see any "xterm" strings in there, but ok. :-) I ran with >>>>>>>>>> orte/test/mpi/intercomm_create.c, and that hangs for me as well: >>>>>>>>>> >>>>>>>>>> ----- >>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create >>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create >>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, >>>>>>>>>> &inter) [rank 4] >>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, >>>>>>>>>> &inter) [rank 5] >>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, >>>>>>>>>> &inter) [rank 6] >>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, >>>>>>>>>> &inter) [rank 7] >>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) >>>>>>>>>> [rank 4] >>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) >>>>>>>>>> [rank 5] >>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) >>>>>>>>>> [rank 6] >>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) >>>>>>>>>> [rank 7] >>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0) >>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0) >>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0) >>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0) >>>>>>>>>> [hang] >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> Similarly, on my Mac, it hangs with no output: >>>>>>>>>> >>>>>>>>>> ----- >>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create >>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create >>>>>>>>>> [hang] >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> George. >>>>>>>>>>> >>>>>>>>>>> On Sep 18, 2013, at 07:53 , "Jeff Squyres (jsquyres)" >>>>>>>>>>> <jsquy...@cisco.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> George -- >>>>>>>>>>>> >>>>>>>>>>>> When I build the SVN trunk (r29201) on 64 bit linux, your attached >>>>>>>>>>>> test case hangs: >>>>>>>>>>>> >>>>>>>>>>>> ----- >>>>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create >>>>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create >>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, >>>>>>>>>>>> &inter) [rank 4] >>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, >>>>>>>>>>>> &inter) [rank 5] >>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, >>>>>>>>>>>> &inter) [rank 6] >>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, >>>>>>>>>>>> &inter) [rank 7] >>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0) >>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0) >>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0) >>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0) >>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) >>>>>>>>>>>> [rank 4] >>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) >>>>>>>>>>>> [rank 5] >>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) >>>>>>>>>>>> [rank 6] >>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) >>>>>>>>>>>> [rank 7] >>>>>>>>>>>> [hang] >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>> On my Mac, it hangs without printing anything: >>>>>>>>>>>> >>>>>>>>>>>> ----- >>>>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create >>>>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create >>>>>>>>>>>> [hang] >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sep 18, 2013, at 1:48 AM, George Bosilca <bosi...@icl.utk.edu> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Here is a quick (and definitively not the cleanest) patch that >>>>>>>>>>>>> addresses the MPI_Intercomm issue at the MPI level. It should be >>>>>>>>>>>>> applied after removal of 29166. >>>>>>>>>>>>> >>>>>>>>>>>>> I also added the corrected test case stressing the corner cases >>>>>>>>>>>>> by doing barriers at every inter-comm creation and doing a clean >>>>>>>>>>>>> disconnect. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Jeff Squyres >>>>>>>>>>>> jsquy...@cisco.com >>>>>>>>>>>> For corporate legal information go to: >>>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> devel mailing list >>>>>>>>>>>> de...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> devel mailing list >>>>>>>>>>> de...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jeff Squyres >>>>>>>>>> jsquy...@cisco.com >>>>>>>>>> For corporate legal information go to: >>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> devel mailing list >>>>>>>>>> de...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> de...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >