Been wracking my brain on this, and I can't find any way to do this cleanly 
without invoking some kind of extension/modification to the MPI-RTE interface.

The problem is that we are now executing an "in-band" modex operation. This is 
fine, but the modex operation (no matter how it is executed) is an 
RTE-dependent operation. Our current ompi_rte_modex function automatically 
performs it out-of-band, so we don't want to use it here. However, we currently 
lack any interface for directly obtaining endpoint info and/or for 
defining/setting locality.

There are several ways we could resolve the endpoint problem:

* define flags as I mentioned previously and modify the opal_db APIs to 
indicate "we want only non-RTE data"

* set a convention that all OMPI-level data begin with a known substring like 
"ompi." - we could then simply call "fetch" with an "ompi.*" wildcard to 
retrieve all MPI-related data

* modify the ompi_modex_* routines to insert "ompi." at the beginning of all 
keys - this would require an asprintf call, which means a malloc

* add new functions "ompi_rte_get_endpoint_info" and 
"ompi_rte_set_endpoint_info", and let the RTEs figure out how to get/set the 
right data


The locality issue is a little tougher. I can't think of any RTE-agnostic 
method for setting locality. Unless someone else can, the only option I can 
propose is to add a new MPI-RTE interface "ompi_rte_set_locality(proc)".

Thoughts?
Ralph


On Sep 18, 2013, at 10:18 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Actually, we wouldn't have to modify the interface - just have to define a 
> DB_RTE flag and OR it to the DB_INTERNAL/DB_EXTERNAL one. We'd need to modify 
> the "fetch" routines to pass the flag into them so we fetched the right 
> things, but that's a simple change.
> 
> On Sep 18, 2013, at 10:12 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>> I struggled with that myself when doing my earlier patch - part of the 
>> reason why I added the dpm API.
>> 
>> I don't know how to update the locality without referencing RTE-specific 
>> keys, so maybe the best thing would be to provide some kind of hook into the 
>> db that says we want all the non-RTE keys? Would be simple to add that 
>> capability, though we'd have to modify the interface so we specify "RTE key" 
>> when doing the initial store.
>> 
>> The "internal" flag is used to avoid re-sending data to the system under 
>> PMI. We "store" our data as "external" in the PMI components so the data 
>> gets pushed out, then fetch using PMI and store "internal" to put it in our 
>> internal hash. So "internal" doesn't mean "non-RTE".
>> 
>> 
>> On Sep 18, 2013, at 10:02 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>> 
>>> I hit send too early.
>>> 
>>> Now that we move the entire "local" modex is there any way to trim it down 
>>> or to replace the entries that are not correct anymore? Like the locality?
>>> 
>>> George.
>>> 
>>> On Sep 18, 2013, at 18:53 , George Bosilca <bosi...@icl.utk.edu> wrote:
>>> 
>>>> Regarding your comment on the bug trac, I noticed there is a DB_INTERNAL 
>>>> flag. While I see how to set I could not figure out any way to get it back.
>>>> 
>>>> With the required modification of the DB API can't we take advantage of it?
>>>> 
>>>> George.
>>>> 
>>>> 
>>>> On Sep 18, 2013, at 18:52 , Ralph Castain <r...@open-mpi.org> wrote:
>>>> 
>>>>> Thanks George - much appreciated
>>>>> 
>>>>> On Sep 18, 2013, at 9:49 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>> 
>>>>>> The test case was broken. I just pushed a fix.
>>>>>> 
>>>>>> George.
>>>>>> 
>>>>>> On Sep 18, 2013, at 16:49 , Ralph Castain <r...@open-mpi.org> wrote:
>>>>>> 
>>>>>>> Hangs with any np > 1
>>>>>>> 
>>>>>>> However, I'm not sure if that's an issue with the test vs the 
>>>>>>> underlying implementation
>>>>>>> 
>>>>>>> On Sep 18, 2013, at 7:40 AM, "Jeff Squyres (jsquyres)" 
>>>>>>> <jsquy...@cisco.com> wrote:
>>>>>>> 
>>>>>>>> Does it hang when you run with -np 4?
>>>>>>>> 
>>>>>>>> Sent from my phone. No type good. 
>>>>>>>> 
>>>>>>>> On Sep 18, 2013, at 4:10 PM, "Ralph Castain" <r...@open-mpi.org> wrote:
>>>>>>>> 
>>>>>>>>> Strange - it works fine for me on my Mac. However, I see one 
>>>>>>>>> difference - I only run it with np=1
>>>>>>>>> 
>>>>>>>>> On Sep 18, 2013, at 2:22 AM, Jeff Squyres (jsquyres) 
>>>>>>>>> <jsquy...@cisco.com> wrote:
>>>>>>>>> 
>>>>>>>>>> On Sep 18, 2013, at 9:33 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> 1. sm doesn't work between spawned processes. So you must have 
>>>>>>>>>>> another network enabled.
>>>>>>>>>> 
>>>>>>>>>> I know :-).  I have tcp available as well (OMPI will abort if you 
>>>>>>>>>> only run with sm,self because the comm_spawn will fail with 
>>>>>>>>>> unreachable errors -- I just tested/proved this to myself).
>>>>>>>>>> 
>>>>>>>>>>> 2. Don't use the test case attached to my email, I left an xterm 
>>>>>>>>>>> based spawn and the debugging. It can't work without xterm support. 
>>>>>>>>>>> Instead try using the test case from the trunk, the one committed 
>>>>>>>>>>> by Ralph.
>>>>>>>>>> 
>>>>>>>>>> I didn't see any "xterm" strings in there, but ok.  :-)  I ran with 
>>>>>>>>>> orte/test/mpi/intercomm_create.c, and that hangs for me as well:
>>>>>>>>>> 
>>>>>>>>>> -----
>>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create
>>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create
>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, 
>>>>>>>>>> &inter) [rank 4]
>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, 
>>>>>>>>>> &inter) [rank 5]
>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, 
>>>>>>>>>> &inter) [rank 6]
>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, 
>>>>>>>>>> &inter) [rank 7]
>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) 
>>>>>>>>>> [rank 4]
>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) 
>>>>>>>>>> [rank 5]
>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) 
>>>>>>>>>> [rank 6]
>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) 
>>>>>>>>>> [rank 7]
>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>> [hang]
>>>>>>>>>> -----
>>>>>>>>>> 
>>>>>>>>>> Similarly, on my Mac, it hangs with no output:
>>>>>>>>>> 
>>>>>>>>>> -----
>>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create
>>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create
>>>>>>>>>> [hang]
>>>>>>>>>> -----
>>>>>>>>>> 
>>>>>>>>>>> George.
>>>>>>>>>>> 
>>>>>>>>>>> On Sep 18, 2013, at 07:53 , "Jeff Squyres (jsquyres)" 
>>>>>>>>>>> <jsquy...@cisco.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> George --
>>>>>>>>>>>> 
>>>>>>>>>>>> When I build the SVN trunk (r29201) on 64 bit linux, your attached 
>>>>>>>>>>>> test case hangs:
>>>>>>>>>>>> 
>>>>>>>>>>>> -----
>>>>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create
>>>>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create
>>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, 
>>>>>>>>>>>> &inter) [rank 4]
>>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, 
>>>>>>>>>>>> &inter) [rank 5]
>>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, 
>>>>>>>>>>>> &inter) [rank 6]
>>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, 
>>>>>>>>>>>> &inter) [rank 7]
>>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) 
>>>>>>>>>>>> [rank 4]
>>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) 
>>>>>>>>>>>> [rank 5]
>>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) 
>>>>>>>>>>>> [rank 6]
>>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) 
>>>>>>>>>>>> [rank 7]
>>>>>>>>>>>> [hang]
>>>>>>>>>>>> -----
>>>>>>>>>>>> 
>>>>>>>>>>>> On my Mac, it hangs without printing anything:
>>>>>>>>>>>> 
>>>>>>>>>>>> -----
>>>>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create
>>>>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create   
>>>>>>>>>>>> [hang]
>>>>>>>>>>>> -----
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sep 18, 2013, at 1:48 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Here is a quick (and definitively not the cleanest) patch that 
>>>>>>>>>>>>> addresses the MPI_Intercomm issue at the MPI level. It should be 
>>>>>>>>>>>>> applied after removal of 29166.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I also added the corrected test case stressing the corner cases 
>>>>>>>>>>>>> by doing barriers at every inter-comm creation and doing a clean 
>>>>>>>>>>>>> disconnect.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Jeff Squyres
>>>>>>>>>>>> jsquy...@cisco.com
>>>>>>>>>>>> For corporate legal information go to: 
>>>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> devel mailing list
>>>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Jeff Squyres
>>>>>>>>>> jsquy...@cisco.com
>>>>>>>>>> For corporate legal information go to: 
>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> de...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 

Reply via email to