Thanks George. I filed https://svn.open-mpi.org/trac/ompi/ticket/3162 about this.
On Jul 4, 2012, at 5:34 AM, Juan A. Rico wrote: > Thanks all of you for your time and early responses. > > After applying the patch, SM can be used by raising its priority. It is > enough for me (I hope so). But it continues failing when I specify --mca coll > sm,self in the command line (with tuned too). > I am not going to use this release in production, only for playing with the > code :-) > > Regards, > Juan Antonio. > > El 04/07/2012, a las 02:59, George Bosilca escribió: > >> Juan, >> >> Something weird is going on there. The selection mechanism for the SM coll >> and SM BTL should be very similar. However, the SM BTL successfully select >> itself while the SM coll fails to determine that all processes are local. >> >> In the coll SM the issue is that the remote procs do not have the LOCAL flag >> set, even when they are on the local node (however the ompi_proc_local() >> return has a special flag stating that all processes in the job are local). >> I compared the initialization of the SM BTL and the SM coll. It turns out >> that somehow the procs returned by ompi_proc_all() and the procs provided to >> the add_proc of the BTLs are not identical. The second have the local flag >> correctly set, so I went a little bit deeper. >> >> Here is what I found while toying with gdb inside: >> >> breakpoint 1, mca_coll_sm_init_query (enable_progress_threads=false, >> enable_mpi_threads=false) at coll_sm_module.c:132 >> >> (gdb) p procs[0] >> $1 = (ompi_proc_t *) 0x109a1e8c0 >> (gdb) p procs[1] >> $2 = (ompi_proc_t *) 0x109a1e970 >> (gdb) p procs[0]->proc_flags >> $3 = 0 >> (gdb) p procs[1]->proc_flags >> $4 = 4095 >> >> Breakpoint 2, mca_btl_sm_add_procs (btl=0x109baa1c0, nprocs=2, >> procs=0x109a319e0, peers=0x109a319f0, reachability=0x7fff691378e8) at >> btl_sm.c:427 >> >> (gdb) p procs[0] >> $5 = (struct ompi_proc_t *) 0x109a1e8c0 >> (gdb) p procs[1] >> $6 = (struct ompi_proc_t *) 0x109a1e970 >> (gdb) p procs[0]->proc_flags >> $7 = 1920 >> (gdb) p procs[1]->proc_flags >> $8 = 4095 >> >> Thus the problem seems to come from the fact that during the initialization >> of the SM coll the flags are not correctly set. However, this is somehow >> expected … as the call to the initialization happens before the exchange of >> the business cards (and therefore there is no way to have any knowledge >> about the remote procs). >> >> So, either something changed drastically in the way we set the flags for >> remote processes or we did not use the SM coll for the last 3 years. I think >> the culprit is r21967 (https://svn.open-mpi.org/trac/ompi/changeset/21967) >> who added a "selection" logic based on knowledge about remote procs in the >> coll SM initialization function. But this selection logic was way to early >> !!! >> >> I would strongly encourage you not to use this SM collective component in >> anything related to production runs. >> >> george. >> >> PS: However, if you want to toy with the SM coll apply the following patch: >> Index: coll_sm_module.c >> =================================================================== >> --- coll_sm_module.c (revision 26737) >> +++ coll_sm_module.c (working copy) >> @@ -128,6 +128,7 @@ >> int mca_coll_sm_init_query(bool enable_progress_threads, >> bool enable_mpi_threads) >> { >> +#if 0 >> ompi_proc_t *my_proc, **procs; >> size_t i, size; >> >> @@ -158,7 +159,7 @@ >> "coll:sm:init_query: no other local procs; >> disqualifying myself"); >> return OMPI_ERR_NOT_AVAILABLE; >> } >> - >> +#endif >> /* Don't do much here because we don't really want to allocate any >> shared memory until this component is selected to be used. */ >> opal_output_verbose(10, mca_coll_base_output, >> >> >> >> >> >> On Jul 4, 2012, at 02:05 , Ralph Castain wrote: >> >>> Okay, please try this again with r26739 or above. You can remove the rest >>> of the "verbose" settings and the --display-map so we declutter the output. >>> Please add "-mca orte_nidmap_verbose 20" to your cmd line. >>> >>> Thanks! >>> Ralph >>> >>> >>> On Tue, Jul 3, 2012 at 1:50 PM, Juan A. Rico <jar...@unex.es> wrote: >>> Here is the output. >>> >>> [jarico@Metropolis-01 examples]$ >>> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --bind-to-core >>> --bynode --mca mca_base_verbose 100 --mca mca_coll_base_output 100 --mca >>> coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca >>> mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 -n 2 >>> -mca grpcomm_base_verbose 5 ./bmem >>> [Metropolis-01:24563] mca: base: components_open: Looking for hwloc >>> components >>> [Metropolis-01:24563] mca: base: components_open: opening hwloc components >>> [Metropolis-01:24563] mca: base: components_open: found loaded component >>> hwloc142 >>> [Metropolis-01:24563] mca: base: components_open: component hwloc142 has no >>> register function >>> [Metropolis-01:24563] mca: base: components_open: component hwloc142 has no >>> open function >>> [Metropolis-01:24563] hwloc:base:get_topology >>> [Metropolis-01:24563] hwloc:base: no cpus specified - using root available >>> cpuset >>> [Metropolis-01:24563] mca:base:select:(grpcomm) Querying component [bad] >>> [Metropolis-01:24563] mca:base:select:(grpcomm) Query of component [bad] >>> set priority to 10 >>> [Metropolis-01:24563] mca:base:select:(grpcomm) Selected component [bad] >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:receive start comm >>> -------------------------------------------------------------------------- >>> WARNING: a request was made to bind a process. While the system >>> supports binding the process itself, at least one node does NOT >>> support binding memory to the process location. >>> >>> Node: Metropolis-01 >>> >>> This is a warning only; your job will continue, though performance may >>> be degraded. >>> -------------------------------------------------------------------------- >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base:get_nbojbs computed data 8 of Core:0 >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24563] hwloc:base: get available cpus >>> [Metropolis-01:24563] hwloc:base:filter_cpus specified - already done >>> >>> ======================== JOB MAP ======================== >>> >>> Data for node: Metropolis-01 Num procs: 2 >>> Process OMPI jobid: [36265,1] App: 0 Process rank: 0 >>> Process OMPI jobid: [36265,1] App: 0 Process rank: 1 >>> >>> ============================================================= >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job [36265,0] >>> tag 1 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:xcast updating daemon >>> nidmap >>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient list >>> is empty! >>> [Metropolis-01:24564] mca: base: components_open: Looking for hwloc >>> components >>> [Metropolis-01:24564] mca: base: components_open: opening hwloc components >>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>> hwloc142 >>> [Metropolis-01:24564] mca: base: components_open: component hwloc142 has no >>> register function >>> [Metropolis-01:24564] mca: base: components_open: component hwloc142 has no >>> open function >>> [Metropolis-01:24565] mca: base: components_open: Looking for hwloc >>> components >>> [Metropolis-01:24565] mca: base: components_open: opening hwloc components >>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>> hwloc142 >>> [Metropolis-01:24565] mca: base: components_open: component hwloc142 has no >>> register function >>> [Metropolis-01:24565] mca: base: components_open: component hwloc142 has no >>> open function >>> [Metropolis-01:24564] mca:base:select:(grpcomm) Querying component [bad] >>> [Metropolis-01:24564] mca:base:select:(grpcomm) Query of component [bad] >>> set priority to 10 >>> [Metropolis-01:24564] mca:base:select:(grpcomm) Selected component [bad] >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive start comm >>> [Metropolis-01:24564] computing locality - getting object at level CORE, >>> index 0 >>> [Metropolis-01:24564] hwloc:base: get available cpus >>> [Metropolis-01:24564] hwloc:base:get_available_cpus first time - filtering >>> cpus >>> [Metropolis-01:24564] hwloc:base: no cpus specified - using root available >>> cpuset >>> [Metropolis-01:24564] computing locality - getting object at level CORE, >>> index 1 >>> [Metropolis-01:24564] hwloc:base: get available cpus >>> [Metropolis-01:24564] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24564] computing locality - shifting up from L1CACHE >>> [Metropolis-01:24564] computing locality - shifting up from L2CACHE >>> [Metropolis-01:24564] computing locality - shifting up from L3CACHE >>> [Metropolis-01:24564] computing locality - filling level SOCKET >>> [Metropolis-01:24564] computing locality - filling level NUMA >>> [Metropolis-01:24564] locality: CL:CU:N:B:Nu:S >>> [Metropolis-01:24565] mca:base:select:(grpcomm) Querying component [bad] >>> [Metropolis-01:24565] mca:base:select:(grpcomm) Query of component [bad] >>> set priority to 10 >>> [Metropolis-01:24565] mca:base:select:(grpcomm) Selected component [bad] >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive start comm >>> [Metropolis-01:24564] mca: base: components_open: Looking for coll >>> components >>> [Metropolis-01:24564] mca: base: components_open: opening coll components >>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>> tuned >>> [Metropolis-01:24564] mca: base: components_open: component tuned has no >>> register function >>> [Metropolis-01:24564] coll:tuned:component_open: done! >>> [Metropolis-01:24564] mca: base: components_open: component tuned open >>> function successful >>> [Metropolis-01:24564] mca: base: components_open: found loaded component sm >>> [Metropolis-01:24564] mca: base: components_open: component sm register >>> function successful >>> [Metropolis-01:24564] mca: base: components_open: component sm has no open >>> function >>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>> libnbc >>> [Metropolis-01:24564] mca: base: components_open: component libnbc register >>> function successful >>> [Metropolis-01:24564] mca: base: components_open: component libnbc open >>> function successful >>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>> hierarch >>> [Metropolis-01:24564] mca: base: components_open: component hierarch has no >>> register function >>> [Metropolis-01:24564] mca: base: components_open: component hierarch open >>> function successful >>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>> basic >>> [Metropolis-01:24564] mca: base: components_open: component basic register >>> function successful >>> [Metropolis-01:24564] mca: base: components_open: component basic has no >>> open function >>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>> inter >>> [Metropolis-01:24564] mca: base: components_open: component inter has no >>> register function >>> [Metropolis-01:24564] mca: base: components_open: component inter open >>> function successful >>> [Metropolis-01:24564] mca: base: components_open: found loaded component >>> self >>> [Metropolis-01:24564] mca: base: components_open: component self has no >>> register function >>> [Metropolis-01:24564] mca: base: components_open: component self open >>> function successful >>> [Metropolis-01:24565] computing locality - getting object at level CORE, >>> index 1 >>> [Metropolis-01:24565] hwloc:base: get available cpus >>> [Metropolis-01:24565] hwloc:base:get_available_cpus first time - filtering >>> cpus >>> [Metropolis-01:24565] hwloc:base: no cpus specified - using root available >>> cpuset >>> [Metropolis-01:24565] hwloc:base: get available cpus >>> [Metropolis-01:24565] hwloc:base:filter_cpus specified - already done >>> [Metropolis-01:24565] computing locality - getting object at level CORE, >>> index 0 >>> [Metropolis-01:24565] computing locality - shifting up from L1CACHE >>> [Metropolis-01:24565] computing locality - shifting up from L2CACHE >>> [Metropolis-01:24565] computing locality - shifting up from L3CACHE >>> [Metropolis-01:24565] computing locality - filling level SOCKET >>> [Metropolis-01:24565] computing locality - filling level NUMA >>> [Metropolis-01:24565] locality: CL:CU:N:B:Nu:S >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0] >>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 0 >>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO >>> PARTICIPANTS >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 0 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 0 >>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:modex: performing modex >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:pack_modex: reporting 4 >>> entries >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:full:modex: executing >>> allgather >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering allgather >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad allgather underway >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:modex: modex posted >>> [Metropolis-01:24565] mca: base: components_open: Looking for coll >>> components >>> [Metropolis-01:24565] mca: base: components_open: opening coll components >>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>> tuned >>> [Metropolis-01:24565] mca: base: components_open: component tuned has no >>> register function >>> [Metropolis-01:24565] coll:tuned:component_open: done! >>> [Metropolis-01:24565] mca: base: components_open: component tuned open >>> function successful >>> [Metropolis-01:24565] mca: base: components_open: found loaded component sm >>> [Metropolis-01:24565] mca: base: components_open: component sm register >>> function successful >>> [Metropolis-01:24565] mca: base: components_open: component sm has no open >>> function >>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>> libnbc >>> [Metropolis-01:24565] mca: base: components_open: component libnbc register >>> function successful >>> [Metropolis-01:24565] mca: base: components_open: component libnbc open >>> function successful >>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>> hierarch >>> [Metropolis-01:24565] mca: base: components_open: component hierarch has no >>> register function >>> [Metropolis-01:24565] mca: base: components_open: component hierarch open >>> function successful >>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>> basic >>> [Metropolis-01:24565] mca: base: components_open: component basic register >>> function successful >>> [Metropolis-01:24565] mca: base: components_open: component basic has no >>> open function >>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>> inter >>> [Metropolis-01:24565] mca: base: components_open: component inter has no >>> register function >>> [Metropolis-01:24565] mca: base: components_open: component inter open >>> function successful >>> [Metropolis-01:24565] mca: base: components_open: found loaded component >>> self >>> [Metropolis-01:24565] mca: base: components_open: component self has no >>> register function >>> [Metropolis-01:24565] mca: base: components_open: component self open >>> function successful >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1] >>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 0 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 0 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 0 >>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 0 LOCALLY COMPLETE - SENDING >>> TO GLOBAL COLLECTIVE >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon >>> collective recvd from [[36265,0],0] >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING >>> COLLECTIVE 0 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM CONTRIBS: >>> 2 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job [36265,1] >>> tag 30 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient list >>> is empty! >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:modex: performing modex >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:pack_modex: reporting 4 >>> entries >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:full:modex: executing >>> allgather >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering allgather >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad allgather underway >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:modex: modex posted >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing >>> collective return for id 0 >>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 0 >>> [Metropolis-01:24564] [[36265,1],0] STORING MODEX DATA >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:store_modex adding modex >>> entry for proc [[36265,1],0] >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing >>> collective return for id 0 >>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 0 >>> [Metropolis-01:24565] [[36265,1],1] STORING MODEX DATA >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:store_modex adding modex >>> entry for proc [[36265,1],0] >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:update_modex_entries: >>> adding 4 entries for proc [[36265,1],0] >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:store_modex adding modex >>> entry for proc [[36265,1],1] >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:update_modex_entries: >>> adding 4 entries for proc [[36265,1],1] >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:update_modex_entries: >>> adding 4 entries for proc [[36265,1],0] >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:store_modex adding modex >>> entry for proc [[36265,1],1] >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:update_modex_entries: >>> adding 4 entries for proc [[36265,1],1] >>> [Metropolis-01:24564] coll:find_available: querying coll component tuned >>> [Metropolis-01:24564] coll:find_available: coll component tuned is available >>> [Metropolis-01:24565] coll:find_available: querying coll component tuned >>> [Metropolis-01:24565] coll:find_available: coll component tuned is available >>> [Metropolis-01:24565] coll:find_available: querying coll component sm >>> [Metropolis-01:24564] coll:find_available: querying coll component sm >>> [Metropolis-01:24564] coll:sm:init_query: no other local procs; >>> disqualifying myself >>> [Metropolis-01:24564] coll:find_available: coll component sm is not >>> available >>> [Metropolis-01:24564] coll:find_available: querying coll component libnbc >>> [Metropolis-01:24564] coll:find_available: coll component libnbc is >>> available >>> [Metropolis-01:24564] coll:find_available: querying coll component hierarch >>> [Metropolis-01:24564] coll:find_available: coll component hierarch is >>> available >>> [Metropolis-01:24564] coll:find_available: querying coll component basic >>> [Metropolis-01:24564] coll:find_available: coll component basic is available >>> [Metropolis-01:24565] coll:sm:init_query: no other local procs; >>> disqualifying myself >>> [Metropolis-01:24565] coll:find_available: coll component sm is not >>> available >>> [Metropolis-01:24565] coll:find_available: querying coll component libnbc >>> [Metropolis-01:24565] coll:find_available: coll component libnbc is >>> available >>> [Metropolis-01:24565] coll:find_available: querying coll component hierarch >>> [Metropolis-01:24565] coll:find_available: coll component hierarch is >>> available >>> [Metropolis-01:24565] coll:find_available: querying coll component basic >>> [Metropolis-01:24565] coll:find_available: coll component basic is available >>> [Metropolis-01:24564] coll:find_available: querying coll component inter >>> [Metropolis-01:24564] coll:find_available: coll component inter is available >>> [Metropolis-01:24564] coll:find_available: querying coll component self >>> [Metropolis-01:24564] coll:find_available: coll component self is available >>> [Metropolis-01:24565] coll:find_available: querying coll component inter >>> [Metropolis-01:24565] coll:find_available: coll component inter is available >>> [Metropolis-01:24565] coll:find_available: querying coll component self >>> [Metropolis-01:24565] coll:find_available: coll component self is available >>> [Metropolis-01:24565] hwloc:base:get_nbojbs computed data 0 of NUMANode:0 >>> [Metropolis-01:24564] hwloc:base:get_nbojbs computed data 0 of NUMANode:0 >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1] >>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 1 >>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO >>> PARTICIPANTS >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 1 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 1 >>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0] >>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 1 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 1 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 1 >>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 1 LOCALLY COMPLETE - SENDING >>> TO GLOBAL COLLECTIVE >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon >>> collective recvd from [[36265,0],0] >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING >>> COLLECTIVE 1 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM CONTRIBS: >>> 2 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job [36265,1] >>> tag 30 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient list >>> is empty! >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering barrier >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad barrier underway >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering barrier >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad barrier underway >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing >>> collective return for id 1 >>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 1 >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing >>> collective return for id 1 >>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 1 >>> [Metropolis-01:24565] coll:base:comm_select: new communicator: >>> MPI_COMM_WORLD (cid 0) >>> [Metropolis-01:24565] coll:base:comm_select: Checking all available modules >>> [Metropolis-01:24565] coll:tuned:module_tuned query called >>> [Metropolis-01:24565] coll:base:comm_select: component available: tuned, >>> priority: 30 >>> [Metropolis-01:24565] coll:base:comm_select: component available: libnbc, >>> priority: 10 >>> [Metropolis-01:24565] coll:base:comm_select: component not available: >>> hierarch >>> [Metropolis-01:24565] coll:base:comm_select: component available: basic, >>> priority: 10 >>> [Metropolis-01:24565] coll:base:comm_select: component not available: inter >>> [Metropolis-01:24565] coll:base:comm_select: component not available: self >>> [Metropolis-01:24565] coll:tuned:module_init called. >>> [Metropolis-01:24565] coll:tuned:module_init Tuned is in use >>> [Metropolis-01:24565] coll:base:comm_select: new communicator: >>> MPI_COMM_SELF (cid 1) >>> [Metropolis-01:24565] coll:base:comm_select: Checking all available modules >>> [Metropolis-01:24564] coll:base:comm_select: new communicator: >>> MPI_COMM_WORLD (cid 0) >>> [Metropolis-01:24564] coll:base:comm_select: Checking all available modules >>> [Metropolis-01:24564] coll:tuned:module_tuned query called >>> [Metropolis-01:24564] coll:base:comm_select: component available: tuned, >>> priority: 30 >>> [Metropolis-01:24564] coll:base:comm_select: component available: libnbc, >>> priority: 10 >>> [Metropolis-01:24564] coll:base:comm_select: component not available: >>> hierarch >>> [Metropolis-01:24564] coll:base:comm_select: component available: basic, >>> priority: 10 >>> [Metropolis-01:24564] coll:base:comm_select: component not available: inter >>> [Metropolis-01:24564] coll:base:comm_select: component not available: self >>> [Metropolis-01:24564] coll:tuned:module_init called. >>> [Metropolis-01:24565] coll:tuned:module_tuned query called >>> [Metropolis-01:24565] coll:base:comm_select: component not available: tuned >>> [Metropolis-01:24565] coll:base:comm_select: component available: libnbc, >>> priority: 10 >>> [Metropolis-01:24565] coll:base:comm_select: component not available: >>> hierarch >>> [Metropolis-01:24565] coll:base:comm_select: component available: basic, >>> priority: 10 >>> [Metropolis-01:24565] coll:base:comm_select: component not available: inter >>> [Metropolis-01:24565] coll:base:comm_select: component available: self, >>> priority: 75 >>> [Metropolis-01:24564] coll:tuned:module_init Tuned is in use >>> [Metropolis-01:24564] coll:base:comm_select: new communicator: >>> MPI_COMM_SELF (cid 1) >>> [Metropolis-01:24564] coll:base:comm_select: Checking all available modules >>> [Metropolis-01:24564] coll:tuned:module_tuned query called >>> [Metropolis-01:24564] coll:base:comm_select: component not available: tuned >>> [Metropolis-01:24564] coll:base:comm_select: component available: libnbc, >>> priority: 10 >>> [Metropolis-01:24564] coll:base:comm_select: component not available: >>> hierarch >>> [Metropolis-01:24564] coll:base:comm_select: component available: basic, >>> priority: 10 >>> [Metropolis-01:24564] coll:base:comm_select: component not available: inter >>> [Metropolis-01:24564] coll:base:comm_select: component available: self, >>> priority: 75 >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad entering barrier >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],1] >>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 2 >>> [Metropolis-01:24563] [[36265,0],0] ADDING [[36265,1],WILDCARD] TO >>> PARTICIPANTS >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 2 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 2 >>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE RECVD FROM [[36265,1],0] >>> [Metropolis-01:24563] [[36265,0],0] WORKING COLLECTIVE 2 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLLECTIVE 2 >>> [Metropolis-01:24563] [[36265,0],0] PROGRESSING COLL id 2 >>> [Metropolis-01:24563] [[36265,0],0] ALL LOCAL PROCS CONTRIBUTE 2 >>> [Metropolis-01:24563] [[36265,0],0] COLLECTIVE 2 LOCALLY COMPLETE - SENDING >>> TO GLOBAL COLLECTIVE >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: daemon >>> collective recvd from [[36265,0],0] >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: WORKING >>> COLLECTIVE 2 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:daemon_coll: NUM CONTRIBS: >>> 2 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job [36265,1] >>> tag 30 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient list >>> is empty! >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad entering barrier >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:bad barrier underway >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive processing >>> collective return for id 2 >>> [Metropolis-01:24564] [[36265,1],0] CHECKING COLL id 2 >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:bad barrier underway >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive processing >>> collective return for id 2 >>> [Metropolis-01:24565] [[36265,1],1] CHECKING COLL id 2 >>> [Metropolis-01:24565] coll:tuned:component_close: called >>> [Metropolis-01:24565] coll:tuned:component_close: done! >>> [Metropolis-01:24565] mca: base: close: component tuned closed >>> [Metropolis-01:24565] mca: base: close: unloading component tuned >>> [Metropolis-01:24565] mca: base: close: component libnbc closed >>> [Metropolis-01:24565] mca: base: close: unloading component libnbc >>> [Metropolis-01:24565] mca: base: close: unloading component hierarch >>> [Metropolis-01:24565] mca: base: close: unloading component basic >>> [Metropolis-01:24565] mca: base: close: unloading component inter >>> [Metropolis-01:24565] mca: base: close: unloading component self >>> [Metropolis-01:24565] [[36265,1],1] grpcomm:base:receive stop comm >>> [Metropolis-01:24564] coll:tuned:component_close: called >>> [Metropolis-01:24564] coll:tuned:component_close: done! >>> [Metropolis-01:24564] mca: base: close: component tuned closed >>> [Metropolis-01:24564] mca: base: close: unloading component tuned >>> [Metropolis-01:24564] mca: base: close: component libnbc closed >>> [Metropolis-01:24564] mca: base: close: unloading component libnbc >>> [Metropolis-01:24564] mca: base: close: unloading component hierarch >>> [Metropolis-01:24564] mca: base: close: unloading component basic >>> [Metropolis-01:24564] mca: base: close: unloading component inter >>> [Metropolis-01:24564] mca: base: close: unloading component self >>> [Metropolis-01:24564] [[36265,1],0] grpcomm:base:receive stop comm >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:bad:xcast sent to job [36265,0] >>> tag 1 >>> [Metropolis-01:24563] [[36265,0],0] grpcomm:xcast:recv:send_relay >>> [Metropolis-01:24563] [[36265,0],0] orte:daemon:send_relay - recipient list >>> is empty! >>> [jarico@Metropolis-01 examples]$ >>> >>> >>> >>> El 03/07/2012, a las 21:44, Ralph Castain escribió: >>> >>> > Interesting - yes, coll sm doesn't think they are on the same node for >>> > some reason. Try adding -mca grpcomm_base_verbose 5 and let's see why >>> > >>> > >>> > On Jul 3, 2012, at 1:24 PM, Juan Antonio Rico Gallego wrote: >>> > >>> >> The code I run is a simple broadcast. >>> >> >>> >> When I do not specify components to run, the output is (more verbose): >>> >> >>> >> [jarico@Metropolis-01 examples]$ >>> >> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --mca >>> >> mca_base_verbose 100 --mca mca_coll_base_output 100 --mca >>> >> coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca >>> >> mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 >>> >> -n 2 ./bmem >>> >> [Metropolis-01:24490] mca: base: components_open: Looking for hwloc >>> >> components >>> >> [Metropolis-01:24490] mca: base: components_open: opening hwloc >>> >> components >>> >> [Metropolis-01:24490] mca: base: components_open: found loaded component >>> >> hwloc142 >>> >> [Metropolis-01:24490] mca: base: components_open: component hwloc142 has >>> >> no register function >>> >> [Metropolis-01:24490] mca: base: components_open: component hwloc142 has >>> >> no open function >>> >> [Metropolis-01:24490] hwloc:base:get_topology >>> >> [Metropolis-01:24490] hwloc:base: no cpus specified - using root >>> >> available cpuset >>> >> >>> >> ======================== JOB MAP ======================== >>> >> >>> >> Data for node: Metropolis-01 Num procs: 2 >>> >> Process OMPI jobid: [36336,1] App: 0 Process rank: 0 >>> >> Process OMPI jobid: [36336,1] App: 0 Process rank: 1 >>> >> >>> >> ============================================================= >>> >> [Metropolis-01:24491] mca: base: components_open: Looking for hwloc >>> >> components >>> >> [Metropolis-01:24491] mca: base: components_open: opening hwloc >>> >> components >>> >> [Metropolis-01:24491] mca: base: components_open: found loaded component >>> >> hwloc142 >>> >> [Metropolis-01:24491] mca: base: components_open: component hwloc142 has >>> >> no register function >>> >> [Metropolis-01:24491] mca: base: components_open: component hwloc142 has >>> >> no open function >>> >> [Metropolis-01:24492] mca: base: components_open: Looking for hwloc >>> >> components >>> >> [Metropolis-01:24492] mca: base: components_open: opening hwloc >>> >> components >>> >> [Metropolis-01:24492] mca: base: components_open: found loaded component >>> >> hwloc142 >>> >> [Metropolis-01:24492] mca: base: components_open: component hwloc142 has >>> >> no register function >>> >> [Metropolis-01:24492] mca: base: components_open: component hwloc142 has >>> >> no open function >>> >> [Metropolis-01:24491] locality: CL:CU:N:B >>> >> [Metropolis-01:24491] hwloc:base: get available cpus >>> >> [Metropolis-01:24491] hwloc:base:get_available_cpus first time - >>> >> filtering cpus >>> >> [Metropolis-01:24491] hwloc:base: no cpus specified - using root >>> >> available cpuset >>> >> [Metropolis-01:24491] hwloc:base:get_available_cpus root object >>> >> [Metropolis-01:24491] mca: base: components_open: Looking for coll >>> >> components >>> >> [Metropolis-01:24491] mca: base: components_open: opening coll components >>> >> [Metropolis-01:24491] mca: base: components_open: found loaded component >>> >> tuned >>> >> [Metropolis-01:24491] mca: base: components_open: component tuned has no >>> >> register function >>> >> [Metropolis-01:24491] coll:tuned:component_open: done! >>> >> [Metropolis-01:24491] mca: base: components_open: component tuned open >>> >> function successful >>> >> [Metropolis-01:24491] mca: base: components_open: found loaded component >>> >> sm >>> >> [Metropolis-01:24491] mca: base: components_open: component sm register >>> >> function successful >>> >> [Metropolis-01:24491] mca: base: components_open: component sm has no >>> >> open function >>> >> [Metropolis-01:24491] mca: base: components_open: found loaded component >>> >> libnbc >>> >> [Metropolis-01:24491] mca: base: components_open: component libnbc >>> >> register function successful >>> >> [Metropolis-01:24491] mca: base: components_open: component libnbc open >>> >> function successful >>> >> [Metropolis-01:24491] mca: base: components_open: found loaded component >>> >> hierarch >>> >> [Metropolis-01:24491] mca: base: components_open: component hierarch has >>> >> no register function >>> >> [Metropolis-01:24491] mca: base: components_open: component hierarch >>> >> open function successful >>> >> [Metropolis-01:24491] mca: base: components_open: found loaded component >>> >> basic >>> >> [Metropolis-01:24491] mca: base: components_open: component basic >>> >> register function successful >>> >> [Metropolis-01:24491] mca: base: components_open: component basic has no >>> >> open function >>> >> [Metropolis-01:24491] mca: base: components_open: found loaded component >>> >> inter >>> >> [Metropolis-01:24491] mca: base: components_open: component inter has no >>> >> register function >>> >> [Metropolis-01:24491] mca: base: components_open: component inter open >>> >> function successful >>> >> [Metropolis-01:24491] mca: base: components_open: found loaded component >>> >> self >>> >> [Metropolis-01:24491] mca: base: components_open: component self has no >>> >> register function >>> >> [Metropolis-01:24491] mca: base: components_open: component self open >>> >> function successful >>> >> [Metropolis-01:24492] locality: CL:CU:N:B >>> >> [Metropolis-01:24492] hwloc:base: get available cpus >>> >> [Metropolis-01:24492] hwloc:base:get_available_cpus first time - >>> >> filtering cpus >>> >> [Metropolis-01:24492] hwloc:base: no cpus specified - using root >>> >> available cpuset >>> >> [Metropolis-01:24492] hwloc:base:get_available_cpus root object >>> >> [Metropolis-01:24492] mca: base: components_open: Looking for coll >>> >> components >>> >> [Metropolis-01:24492] mca: base: components_open: opening coll components >>> >> [Metropolis-01:24492] mca: base: components_open: found loaded component >>> >> tuned >>> >> [Metropolis-01:24492] mca: base: components_open: component tuned has no >>> >> register function >>> >> [Metropolis-01:24492] coll:tuned:component_open: done! >>> >> [Metropolis-01:24492] mca: base: components_open: component tuned open >>> >> function successful >>> >> [Metropolis-01:24492] mca: base: components_open: found loaded component >>> >> sm >>> >> [Metropolis-01:24492] mca: base: components_open: component sm register >>> >> function successful >>> >> [Metropolis-01:24492] mca: base: components_open: component sm has no >>> >> open function >>> >> [Metropolis-01:24492] mca: base: components_open: found loaded component >>> >> libnbc >>> >> [Metropolis-01:24492] mca: base: components_open: component libnbc >>> >> register function successful >>> >> [Metropolis-01:24492] mca: base: components_open: component libnbc open >>> >> function successful >>> >> [Metropolis-01:24492] mca: base: components_open: found loaded component >>> >> hierarch >>> >> [Metropolis-01:24492] mca: base: components_open: component hierarch has >>> >> no register function >>> >> [Metropolis-01:24492] mca: base: components_open: component hierarch >>> >> open function successful >>> >> [Metropolis-01:24492] mca: base: components_open: found loaded component >>> >> basic >>> >> [Metropolis-01:24492] mca: base: components_open: component basic >>> >> register function successful >>> >> [Metropolis-01:24492] mca: base: components_open: component basic has no >>> >> open function >>> >> [Metropolis-01:24492] mca: base: components_open: found loaded component >>> >> inter >>> >> [Metropolis-01:24492] mca: base: components_open: component inter has no >>> >> register function >>> >> [Metropolis-01:24492] mca: base: components_open: component inter open >>> >> function successful >>> >> [Metropolis-01:24492] mca: base: components_open: found loaded component >>> >> self >>> >> [Metropolis-01:24492] mca: base: components_open: component self has no >>> >> register function >>> >> [Metropolis-01:24492] mca: base: components_open: component self open >>> >> function successful >>> >> [Metropolis-01:24491] coll:find_available: querying coll component tuned >>> >> [Metropolis-01:24491] coll:find_available: coll component tuned is >>> >> available >>> >> [Metropolis-01:24491] coll:find_available: querying coll component sm >>> >> [Metropolis-01:24491] coll:sm:init_query: no other local procs; >>> >> disqualifying myself >>> >> [Metropolis-01:24491] coll:find_available: coll component sm is not >>> >> available >>> >> [Metropolis-01:24491] coll:find_available: querying coll component libnbc >>> >> [Metropolis-01:24491] coll:find_available: coll component libnbc is >>> >> available >>> >> [Metropolis-01:24491] coll:find_available: querying coll component >>> >> hierarch >>> >> [Metropolis-01:24491] coll:find_available: coll component hierarch is >>> >> available >>> >> [Metropolis-01:24491] coll:find_available: querying coll component basic >>> >> [Metropolis-01:24491] coll:find_available: coll component basic is >>> >> available >>> >> [Metropolis-01:24491] coll:find_available: querying coll component inter >>> >> [Metropolis-01:24492] coll:find_available: querying coll component tuned >>> >> [Metropolis-01:24492] coll:find_available: coll component tuned is >>> >> available >>> >> [Metropolis-01:24492] coll:find_available: querying coll component sm >>> >> [Metropolis-01:24492] coll:sm:init_query: no other local procs; >>> >> disqualifying myself >>> >> [Metropolis-01:24492] coll:find_available: coll component sm is not >>> >> available >>> >> [Metropolis-01:24492] coll:find_available: querying coll component libnbc >>> >> [Metropolis-01:24492] coll:find_available: coll component libnbc is >>> >> available >>> >> [Metropolis-01:24492] coll:find_available: querying coll component >>> >> hierarch >>> >> [Metropolis-01:24492] coll:find_available: coll component hierarch is >>> >> available >>> >> [Metropolis-01:24492] coll:find_available: querying coll component basic >>> >> [Metropolis-01:24492] coll:find_available: coll component basic is >>> >> available >>> >> [Metropolis-01:24492] coll:find_available: querying coll component inter >>> >> [Metropolis-01:24492] coll:find_available: coll component inter is >>> >> available >>> >> [Metropolis-01:24492] coll:find_available: querying coll component self >>> >> [Metropolis-01:24492] coll:find_available: coll component self is >>> >> available >>> >> [Metropolis-01:24491] coll:find_available: coll component inter is >>> >> available >>> >> [Metropolis-01:24491] coll:find_available: querying coll component self >>> >> [Metropolis-01:24491] coll:find_available: coll component self is >>> >> available >>> >> [Metropolis-01:24492] hwloc:base:get_nbojbs computed data 0 of NUMANode:0 >>> >> [Metropolis-01:24491] hwloc:base:get_nbojbs computed data 0 of NUMANode:0 >>> >> [Metropolis-01:24491] coll:base:comm_select: new communicator: >>> >> MPI_COMM_WORLD (cid 0) >>> >> [Metropolis-01:24491] coll:base:comm_select: Checking all available >>> >> modules >>> >> [Metropolis-01:24491] coll:tuned:module_tuned query called >>> >> [Metropolis-01:24491] coll:base:comm_select: component available: tuned, >>> >> priority: 30 >>> >> [Metropolis-01:24491] coll:base:comm_select: component available: >>> >> libnbc, priority: 10 >>> >> [Metropolis-01:24491] coll:base:comm_select: component not available: >>> >> hierarch >>> >> [Metropolis-01:24491] coll:base:comm_select: component available: basic, >>> >> priority: 10 >>> >> [Metropolis-01:24491] coll:base:comm_select: component not available: >>> >> inter >>> >> [Metropolis-01:24491] coll:base:comm_select: component not available: >>> >> self >>> >> [Metropolis-01:24491] coll:tuned:module_init called. >>> >> [Metropolis-01:24491] coll:tuned:module_init Tuned is in use >>> >> [Metropolis-01:24491] coll:base:comm_select: new communicator: >>> >> MPI_COMM_SELF (cid 1) >>> >> [Metropolis-01:24491] coll:base:comm_select: Checking all available >>> >> modules >>> >> [Metropolis-01:24491] coll:tuned:module_tuned query called >>> >> [Metropolis-01:24491] coll:base:comm_select: component not available: >>> >> tuned >>> >> [Metropolis-01:24491] coll:base:comm_select: component available: >>> >> libnbc, priority: 10 >>> >> [Metropolis-01:24491] coll:base:comm_select: component not available: >>> >> hierarch >>> >> [Metropolis-01:24491] coll:base:comm_select: component available: basic, >>> >> priority: 10 >>> >> [Metropolis-01:24491] coll:base:comm_select: component not available: >>> >> inter >>> >> [Metropolis-01:24491] coll:base:comm_select: component available: self, >>> >> priority: 75 >>> >> [Metropolis-01:24492] coll:base:comm_select: new communicator: >>> >> MPI_COMM_WORLD (cid 0) >>> >> [Metropolis-01:24492] coll:base:comm_select: Checking all available >>> >> modules >>> >> [Metropolis-01:24492] coll:tuned:module_tuned query called >>> >> [Metropolis-01:24492] coll:base:comm_select: component available: tuned, >>> >> priority: 30 >>> >> [Metropolis-01:24492] coll:base:comm_select: component available: >>> >> libnbc, priority: 10 >>> >> [Metropolis-01:24492] coll:base:comm_select: component not available: >>> >> hierarch >>> >> [Metropolis-01:24492] coll:base:comm_select: component available: basic, >>> >> priority: 10 >>> >> [Metropolis-01:24492] coll:base:comm_select: component not available: >>> >> inter >>> >> [Metropolis-01:24492] coll:base:comm_select: component not available: >>> >> self >>> >> [Metropolis-01:24492] coll:tuned:module_init called. >>> >> [Metropolis-01:24492] coll:tuned:module_init Tuned is in use >>> >> [Metropolis-01:24492] coll:base:comm_select: new communicator: >>> >> MPI_COMM_SELF (cid 1) >>> >> [Metropolis-01:24492] coll:base:comm_select: Checking all available >>> >> modules >>> >> [Metropolis-01:24492] coll:tuned:module_tuned query called >>> >> [Metropolis-01:24492] coll:base:comm_select: component not available: >>> >> tuned >>> >> [Metropolis-01:24492] coll:base:comm_select: component available: >>> >> libnbc, priority: 10 >>> >> [Metropolis-01:24492] coll:base:comm_select: component not available: >>> >> hierarch >>> >> [Metropolis-01:24492] coll:base:comm_select: component available: basic, >>> >> priority: 10 >>> >> [Metropolis-01:24492] coll:base:comm_select: component not available: >>> >> inter >>> >> [Metropolis-01:24492] coll:base:comm_select: component available: self, >>> >> priority: 75 >>> >> [Metropolis-01:24491] coll:tuned:component_close: called >>> >> [Metropolis-01:24491] coll:tuned:component_close: done! >>> >> [Metropolis-01:24492] coll:tuned:component_close: called >>> >> [Metropolis-01:24492] coll:tuned:component_close: done! >>> >> [Metropolis-01:24492] mca: base: close: component tuned closed >>> >> [Metropolis-01:24492] mca: base: close: unloading component tuned >>> >> [Metropolis-01:24492] mca: base: close: component libnbc closed >>> >> [Metropolis-01:24492] mca: base: close: unloading component libnbc >>> >> [Metropolis-01:24492] mca: base: close: unloading component hierarch >>> >> [Metropolis-01:24492] mca: base: close: unloading component basic >>> >> [Metropolis-01:24492] mca: base: close: unloading component inter >>> >> [Metropolis-01:24492] mca: base: close: unloading component self >>> >> [Metropolis-01:24491] mca: base: close: component tuned closed >>> >> [Metropolis-01:24491] mca: base: close: unloading component tuned >>> >> [Metropolis-01:24491] mca: base: close: component libnbc closed >>> >> [Metropolis-01:24491] mca: base: close: unloading component libnbc >>> >> [Metropolis-01:24491] mca: base: close: unloading component hierarch >>> >> [Metropolis-01:24491] mca: base: close: unloading component basic >>> >> [Metropolis-01:24491] mca: base: close: unloading component inter >>> >> [Metropolis-01:24491] mca: base: close: unloading component self >>> >> [jarico@Metropolis-01 examples]$ >>> >> >>> >> >>> >> SM is not load because it detects no other processes in the same machine: >>> >> >>> >> [Metropolis-01:24491] coll:sm:init_query: no other local procs; >>> >> disqualifying myself >>> >> >>> >> The machine is a multicore machine with 8 cores. >>> >> >>> >> I need to run SM component code, and I suppose that raising priority it >>> >> will be the component selected when problem is solved. >>> >> >>> >> >>> >> >>> >> El 03/07/2012, a las 21:01, Jeff Squyres escribió: >>> >> >>> >>> The issue is that the "sm" coll component only implements a few of the >>> >>> MPI collective operations. It is usually mixed at run-time with other >>> >>> coll components to fill out the rest of the MPI collective operations. >>> >>> >>> >>> So what is happening is that OMPI is determining that it doesn't have >>> >>> implementations of all the MPI collective operations and aborting. >>> >>> >>> >>> You shouldn't need to manually select your coll module -- OMPI should >>> >>> automatically select the right collective module for you. E.g., if all >>> >>> procs are local on a single machine and sm has a matching >>> >>> implementation for that MPI collective operation, it'll be used. >>> >>> >>> >>> >>> >>> >>> >>> On Jul 3, 2012, at 2:48 PM, Juan Antonio Rico Gallego wrote: >>> >>> >>> >>>> Output is: >>> >>>> >>> >>>> [Metropolis-01:15355] hwloc:base:get_topology >>> >>>> [Metropolis-01:15355] hwloc:base: no cpus specified - using root >>> >>>> available cpuset >>> >>>> >>> >>>> ======================== JOB MAP ======================== >>> >>>> >>> >>>> Data for node: Metropolis-01 Num procs: 2 >>> >>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 0 >>> >>>> Process OMPI jobid: [59809,1] App: 0 Process rank: 1 >>> >>>> >>> >>>> ============================================================= >>> >>>> [Metropolis-01:15356] locality: CL:CU:N:B >>> >>>> [Metropolis-01:15356] hwloc:base: get available cpus >>> >>>> [Metropolis-01:15356] hwloc:base:get_available_cpus first time - >>> >>>> filtering cpus >>> >>>> [Metropolis-01:15356] hwloc:base: no cpus specified - using root >>> >>>> available cpuset >>> >>>> [Metropolis-01:15356] hwloc:base:get_available_cpus root object >>> >>>> [Metropolis-01:15357] locality: CL:CU:N:B >>> >>>> [Metropolis-01:15357] hwloc:base: get available cpus >>> >>>> [Metropolis-01:15357] hwloc:base:get_available_cpus first time - >>> >>>> filtering cpus >>> >>>> [Metropolis-01:15357] hwloc:base: no cpus specified - using root >>> >>>> available cpuset >>> >>>> [Metropolis-01:15357] hwloc:base:get_available_cpus root object >>> >>>> [Metropolis-01:15356] hwloc:base:get_nbojbs computed data 0 of >>> >>>> NUMANode:0 >>> >>>> [Metropolis-01:15357] hwloc:base:get_nbojbs computed data 0 of >>> >>>> NUMANode:0 >>> >>>> >>> >>>> >>> >>>> Regards, >>> >>>> Juan A. Rico >>> >>>> _______________________________________________ >>> >>>> devel mailing list >>> >>>> de...@open-mpi.org >>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> >>> >>> >>> -- >>> >>> Jeff Squyres >>> >>> jsquy...@cisco.com >>> >>> For corporate legal information go to: >>> >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> devel mailing list >>> >>> de...@open-mpi.org >>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >>> >> >>> >> _______________________________________________ >>> >> devel mailing list >>> >> de...@open-mpi.org >>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> > >>> > >>> > _______________________________________________ >>> > devel mailing list >>> > de...@open-mpi.org >>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/