Many thanks, Paul! On Jan 9, 2014, at 3:07 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> Ralph, > > Thanks for fielding all these issues I've been finding. > I will plan to run tonight's trunk tarball through all of the systems where > I've seen any issues. > > -Paul > > > On Thu, Jan 9, 2014 at 8:40 AM, Ralph Castain <r...@open-mpi.org> wrote: > Should now be fixed in trunk (silently fall back to not binding if cores not > found) - scheduled for 1.7.4. If you could test the next trunk tarball, that > would help as I can't actually test it on my machines > > > On Jan 9, 2014, at 6:25 AM, Ralph Castain <r...@open-mpi.org> wrote: > >> I see the issue - there are no "cores" on this topology, only "pu's", so >> "bind-to core" is going to fail even though binding is supported. Will >> adjust. >> >> Thanks! >> >> On Jan 8, 2014, at 9:06 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >> >>> Requested verbose output below. >>> -Paul >>> >>> -bash-4.2$ mpirun -mca ess_base_verbose 10 -np 1 examples/ring_c >>> [pcp-j-17:02150] mca: base: components_register: registering ess components >>> [pcp-j-17:02150] mca: base: components_register: found loaded component env >>> [pcp-j-17:02150] mca: base: components_register: component env has no >>> register or open function >>> [pcp-j-17:02150] mca: base: components_register: found loaded component hnp >>> [pcp-j-17:02150] mca: base: components_register: component hnp has no >>> register or open function >>> [pcp-j-17:02150] mca: base: components_register: found loaded component >>> singleton >>> [pcp-j-17:02150] mca: base: components_register: component singleton >>> register function successful >>> [pcp-j-17:02150] mca: base: components_register: found loaded component tool >>> [pcp-j-17:02150] mca: base: components_register: component tool has no >>> register or open function >>> [pcp-j-17:02150] mca: base: components_open: opening ess components >>> [pcp-j-17:02150] mca: base: components_open: found loaded component env >>> [pcp-j-17:02150] mca: base: components_open: component env open function >>> successful >>> [pcp-j-17:02150] mca: base: components_open: found loaded component hnp >>> [pcp-j-17:02150] mca: base: components_open: component hnp open function >>> successful >>> [pcp-j-17:02150] mca: base: components_open: found loaded component >>> singleton >>> [pcp-j-17:02150] mca: base: components_open: component singleton open >>> function successful >>> [pcp-j-17:02150] mca: base: components_open: found loaded component tool >>> [pcp-j-17:02150] mca: base: components_open: component tool open function >>> successful >>> [pcp-j-17:02150] mca:base:select: Auto-selecting ess components >>> [pcp-j-17:02150] mca:base:select:( ess) Querying component [env] >>> [pcp-j-17:02150] mca:base:select:( ess) Skipping component [env]. Query >>> failed to return a module >>> [pcp-j-17:02150] mca:base:select:( ess) Querying component [hnp] >>> [pcp-j-17:02150] mca:base:select:( ess) Query of component [hnp] set >>> priority to 100 >>> [pcp-j-17:02150] mca:base:select:( ess) Querying component [singleton] >>> [pcp-j-17:02150] mca:base:select:( ess) Skipping component [singleton]. >>> Query failed to return a module >>> [pcp-j-17:02150] mca:base:select:( ess) Querying component [tool] >>> [pcp-j-17:02150] mca:base:select:( ess) Skipping component [tool]. Query >>> failed to return a module >>> [pcp-j-17:02150] mca:base:select:( ess) Selected component [hnp] >>> [pcp-j-17:02150] mca: base: close: component env closed >>> [pcp-j-17:02150] mca: base: close: unloading component env >>> [pcp-j-17:02150] mca: base: close: component singleton closed >>> [pcp-j-17:02150] mca: base: close: unloading component singleton >>> [pcp-j-17:02150] mca: base: close: component tool closed >>> [pcp-j-17:02150] mca: base: close: unloading component tool >>> [pcp-j-17:02150] [[INVALID],INVALID] Topology Info: >>> [pcp-j-17:02150] Type: Machine Number of child objects: 2 >>> Name=NULL >>> Backend=NetBSD >>> OSName=NetBSD >>> OSRelease=6.1 >>> OSVersion="NetBSD 6.1 (CUSTOM) #0: Fri Sep 20 13:19:58 PDT 2013 >>> phargrov@pcp-j-17:/home/phargrov/CUSTOM" >>> Architecture=i386 >>> Backend=x86 >>> Cpuset: 0x00000003 >>> Online: 0x00000003 >>> Allowed: 0x00000003 >>> Bind CPU proc: TRUE >>> Bind CPU thread: TRUE >>> Bind MEM proc: FALSE >>> Bind MEM thread: FALSE >>> Type: PU Number of child objects: 0 >>> Name=NULL >>> Cpuset: 0x00000001 >>> Online: 0x00000001 >>> Allowed: 0x00000001 >>> Type: PU Number of child objects: 0 >>> Name=NULL >>> Cpuset: 0x00000002 >>> Online: 0x00000002 >>> Allowed: 0x00000002 >>> -------------------------------------------------------------------------- >>> While computing bindings, we found no available cpus on >>> the following node: >>> >>> Node: pcp-j-17 >>> >>> Please check your allocation. >>> -------------------------------------------------------------------------- >>> [pcp-j-17:02150] mca: base: close: component hnp closed >>> [pcp-j-17:02150] mca: base: close: unloading component hnp >>> >>> >>> >>> On Wed, Jan 8, 2014 at 8:50 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> Hmmm...looks to me like the code should protect against this - unless the >>> system isn't correctly reporting binding support. Could you run this with >>> "-mca ess_base_verbose 10"? This will output the topology we found, >>> including the binding support (which isn't in the usual output). >>> >>> On Jan 8, 2014, at 8:14 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Hmmm...I see the problem. Looks like binding isn't supported on that >>>> system for some reason, so we need to turn "off" our auto-binding when we >>>> hit that condition. I'll check to see why that isn't happening (was >>>> supposed to do so) >>>> >>>> >>>> On Jan 8, 2014, at 3:43 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >>>> >>>>> While I have yet to get a working build on NetBSD for x86-64 h/w, I >>>>> *have* successfully built Open MPI's current 1.7.4rc tarball on NetBSD-6 >>>>> for x86. However, I can't *run* anything: >>>>> >>>>> Attempting the ring_c example on 2 cores: >>>>> -bash-4.2$ mpirun -mca btl sm,self -np 2 examples/ring_c >>>>> -------------------------------------------------------------------------- >>>>> While computing bindings, we found no available cpus on >>>>> the following node: >>>>> >>>>> Node: pcp-j-17 >>>>> >>>>> Please check your allocation. >>>>> -------------------------------------------------------------------------- >>>>> >>>>> The failure is the same w/o "-mca btl sm,self" >>>>> Singleton runs fail just as the np=2 run did. >>>>> >>>>> I've attached compressed output from "ompi_info --all". >>>>> >>>>> Since this is probably an hwloc-related issue, I also build hwloc-1.7.2 >>>>> from pristine sources. >>>>> I have attached compressed output of lstopo which NOTABLY indicates a >>>>> failure to bind to both of the CPUs. >>>>> >>>>> For now, an explicit "--bind-to none" is working for me. >>>>> Please let me know what additional info may be required. >>>>> >>>>> -Paul >>>>> >>>>> -- >>>>> Paul H. Hargrove phhargr...@lbl.gov >>>>> Future Technologies Group >>>>> Computer and Data Sciences Department Tel: +1-510-495-2352 >>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>>> <ompi_info-netbsd-x86.txt.bz2><lstopo172-netbsd-x86.txt.bz2>_______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> >>> -- >>> Paul H. Hargrove phhargr...@lbl.gov >>> Future Technologies Group >>> Computer and Data Sciences Department Tel: +1-510-495-2352 >>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > Computer and Data Sciences Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/