Should now be fixed in trunk (silently fall back to not binding if cores not found) - scheduled for 1.7.4. If you could test the next trunk tarball, that would help as I can't actually test it on my machines
On Jan 9, 2014, at 6:25 AM, Ralph Castain <r...@open-mpi.org> wrote: > I see the issue - there are no "cores" on this topology, only "pu's", so > "bind-to core" is going to fail even though binding is supported. Will adjust. > > Thanks! > > On Jan 8, 2014, at 9:06 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > >> Requested verbose output below. >> -Paul >> >> -bash-4.2$ mpirun -mca ess_base_verbose 10 -np 1 examples/ring_c >> [pcp-j-17:02150] mca: base: components_register: registering ess components >> [pcp-j-17:02150] mca: base: components_register: found loaded component env >> [pcp-j-17:02150] mca: base: components_register: component env has no >> register or open function >> [pcp-j-17:02150] mca: base: components_register: found loaded component hnp >> [pcp-j-17:02150] mca: base: components_register: component hnp has no >> register or open function >> [pcp-j-17:02150] mca: base: components_register: found loaded component >> singleton >> [pcp-j-17:02150] mca: base: components_register: component singleton >> register function successful >> [pcp-j-17:02150] mca: base: components_register: found loaded component tool >> [pcp-j-17:02150] mca: base: components_register: component tool has no >> register or open function >> [pcp-j-17:02150] mca: base: components_open: opening ess components >> [pcp-j-17:02150] mca: base: components_open: found loaded component env >> [pcp-j-17:02150] mca: base: components_open: component env open function >> successful >> [pcp-j-17:02150] mca: base: components_open: found loaded component hnp >> [pcp-j-17:02150] mca: base: components_open: component hnp open function >> successful >> [pcp-j-17:02150] mca: base: components_open: found loaded component singleton >> [pcp-j-17:02150] mca: base: components_open: component singleton open >> function successful >> [pcp-j-17:02150] mca: base: components_open: found loaded component tool >> [pcp-j-17:02150] mca: base: components_open: component tool open function >> successful >> [pcp-j-17:02150] mca:base:select: Auto-selecting ess components >> [pcp-j-17:02150] mca:base:select:( ess) Querying component [env] >> [pcp-j-17:02150] mca:base:select:( ess) Skipping component [env]. Query >> failed to return a module >> [pcp-j-17:02150] mca:base:select:( ess) Querying component [hnp] >> [pcp-j-17:02150] mca:base:select:( ess) Query of component [hnp] set >> priority to 100 >> [pcp-j-17:02150] mca:base:select:( ess) Querying component [singleton] >> [pcp-j-17:02150] mca:base:select:( ess) Skipping component [singleton]. >> Query failed to return a module >> [pcp-j-17:02150] mca:base:select:( ess) Querying component [tool] >> [pcp-j-17:02150] mca:base:select:( ess) Skipping component [tool]. Query >> failed to return a module >> [pcp-j-17:02150] mca:base:select:( ess) Selected component [hnp] >> [pcp-j-17:02150] mca: base: close: component env closed >> [pcp-j-17:02150] mca: base: close: unloading component env >> [pcp-j-17:02150] mca: base: close: component singleton closed >> [pcp-j-17:02150] mca: base: close: unloading component singleton >> [pcp-j-17:02150] mca: base: close: component tool closed >> [pcp-j-17:02150] mca: base: close: unloading component tool >> [pcp-j-17:02150] [[INVALID],INVALID] Topology Info: >> [pcp-j-17:02150] Type: Machine Number of child objects: 2 >> Name=NULL >> Backend=NetBSD >> OSName=NetBSD >> OSRelease=6.1 >> OSVersion="NetBSD 6.1 (CUSTOM) #0: Fri Sep 20 13:19:58 PDT 2013 >> phargrov@pcp-j-17:/home/phargrov/CUSTOM" >> Architecture=i386 >> Backend=x86 >> Cpuset: 0x00000003 >> Online: 0x00000003 >> Allowed: 0x00000003 >> Bind CPU proc: TRUE >> Bind CPU thread: TRUE >> Bind MEM proc: FALSE >> Bind MEM thread: FALSE >> Type: PU Number of child objects: 0 >> Name=NULL >> Cpuset: 0x00000001 >> Online: 0x00000001 >> Allowed: 0x00000001 >> Type: PU Number of child objects: 0 >> Name=NULL >> Cpuset: 0x00000002 >> Online: 0x00000002 >> Allowed: 0x00000002 >> -------------------------------------------------------------------------- >> While computing bindings, we found no available cpus on >> the following node: >> >> Node: pcp-j-17 >> >> Please check your allocation. >> -------------------------------------------------------------------------- >> [pcp-j-17:02150] mca: base: close: component hnp closed >> [pcp-j-17:02150] mca: base: close: unloading component hnp >> >> >> >> On Wed, Jan 8, 2014 at 8:50 PM, Ralph Castain <r...@open-mpi.org> wrote: >> Hmmm...looks to me like the code should protect against this - unless the >> system isn't correctly reporting binding support. Could you run this with >> "-mca ess_base_verbose 10"? This will output the topology we found, >> including the binding support (which isn't in the usual output). >> >> On Jan 8, 2014, at 8:14 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Hmmm...I see the problem. Looks like binding isn't supported on that system >>> for some reason, so we need to turn "off" our auto-binding when we hit that >>> condition. I'll check to see why that isn't happening (was supposed to do >>> so) >>> >>> >>> On Jan 8, 2014, at 3:43 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >>> >>>> While I have yet to get a working build on NetBSD for x86-64 h/w, I *have* >>>> successfully built Open MPI's current 1.7.4rc tarball on NetBSD-6 for x86. >>>> However, I can't *run* anything: >>>> >>>> Attempting the ring_c example on 2 cores: >>>> -bash-4.2$ mpirun -mca btl sm,self -np 2 examples/ring_c >>>> -------------------------------------------------------------------------- >>>> While computing bindings, we found no available cpus on >>>> the following node: >>>> >>>> Node: pcp-j-17 >>>> >>>> Please check your allocation. >>>> -------------------------------------------------------------------------- >>>> >>>> The failure is the same w/o "-mca btl sm,self" >>>> Singleton runs fail just as the np=2 run did. >>>> >>>> I've attached compressed output from "ompi_info --all". >>>> >>>> Since this is probably an hwloc-related issue, I also build hwloc-1.7.2 >>>> from pristine sources. >>>> I have attached compressed output of lstopo which NOTABLY indicates a >>>> failure to bind to both of the CPUs. >>>> >>>> For now, an explicit "--bind-to none" is working for me. >>>> Please let me know what additional info may be required. >>>> >>>> -Paul >>>> >>>> -- >>>> Paul H. Hargrove phhargr...@lbl.gov >>>> Future Technologies Group >>>> Computer and Data Sciences Department Tel: +1-510-495-2352 >>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>> <ompi_info-netbsd-x86.txt.bz2><lstopo172-netbsd-x86.txt.bz2>_______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >