Many thanks, Paul!

On Jan 9, 2014, at 3:07 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Ralph,
> 
> Thanks for fielding all these issues I've been finding.
> I will plan to run tonight's trunk tarball through all of the systems where 
> I've seen any issues.
> 
> -Paul
> 
> 
> On Thu, Jan 9, 2014 at 8:40 AM, Ralph Castain <r...@open-mpi.org> wrote:
> Should now be fixed in trunk (silently fall back to not binding if cores not 
> found) - scheduled for 1.7.4. If you could test the next trunk tarball, that 
> would help as I can't actually test it on my machines
> 
> 
> On Jan 9, 2014, at 6:25 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>> I see the issue - there are no "cores" on this topology, only "pu's", so 
>> "bind-to core" is going to fail even though binding is supported. Will 
>> adjust.
>> 
>> Thanks!
>> 
>> On Jan 8, 2014, at 9:06 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>> 
>>> Requested verbose output below.
>>> -Paul
>>> 
>>> -bash-4.2$ mpirun -mca ess_base_verbose 10 -np 1 examples/ring_c
>>> [pcp-j-17:02150] mca: base: components_register: registering ess components
>>> [pcp-j-17:02150] mca: base: components_register: found loaded component env
>>> [pcp-j-17:02150] mca: base: components_register: component env has no 
>>> register or open function
>>> [pcp-j-17:02150] mca: base: components_register: found loaded component hnp
>>> [pcp-j-17:02150] mca: base: components_register: component hnp has no 
>>> register or open function
>>> [pcp-j-17:02150] mca: base: components_register: found loaded component 
>>> singleton
>>> [pcp-j-17:02150] mca: base: components_register: component singleton 
>>> register function successful
>>> [pcp-j-17:02150] mca: base: components_register: found loaded component tool
>>> [pcp-j-17:02150] mca: base: components_register: component tool has no 
>>> register or open function
>>> [pcp-j-17:02150] mca: base: components_open: opening ess components
>>> [pcp-j-17:02150] mca: base: components_open: found loaded component env
>>> [pcp-j-17:02150] mca: base: components_open: component env open function 
>>> successful
>>> [pcp-j-17:02150] mca: base: components_open: found loaded component hnp
>>> [pcp-j-17:02150] mca: base: components_open: component hnp open function 
>>> successful
>>> [pcp-j-17:02150] mca: base: components_open: found loaded component 
>>> singleton
>>> [pcp-j-17:02150] mca: base: components_open: component singleton open 
>>> function successful
>>> [pcp-j-17:02150] mca: base: components_open: found loaded component tool
>>> [pcp-j-17:02150] mca: base: components_open: component tool open function 
>>> successful
>>> [pcp-j-17:02150] mca:base:select: Auto-selecting ess components
>>> [pcp-j-17:02150] mca:base:select:(  ess) Querying component [env]
>>> [pcp-j-17:02150] mca:base:select:(  ess) Skipping component [env]. Query 
>>> failed to return a module
>>> [pcp-j-17:02150] mca:base:select:(  ess) Querying component [hnp]
>>> [pcp-j-17:02150] mca:base:select:(  ess) Query of component [hnp] set 
>>> priority to 100
>>> [pcp-j-17:02150] mca:base:select:(  ess) Querying component [singleton]
>>> [pcp-j-17:02150] mca:base:select:(  ess) Skipping component [singleton]. 
>>> Query failed to return a module
>>> [pcp-j-17:02150] mca:base:select:(  ess) Querying component [tool]
>>> [pcp-j-17:02150] mca:base:select:(  ess) Skipping component [tool]. Query 
>>> failed to return a module
>>> [pcp-j-17:02150] mca:base:select:(  ess) Selected component [hnp]
>>> [pcp-j-17:02150] mca: base: close: component env closed
>>> [pcp-j-17:02150] mca: base: close: unloading component env
>>> [pcp-j-17:02150] mca: base: close: component singleton closed
>>> [pcp-j-17:02150] mca: base: close: unloading component singleton
>>> [pcp-j-17:02150] mca: base: close: component tool closed
>>> [pcp-j-17:02150] mca: base: close: unloading component tool
>>> [pcp-j-17:02150] [[INVALID],INVALID] Topology Info:
>>> [pcp-j-17:02150] Type: Machine Number of child objects: 2
>>>         Name=NULL
>>>         Backend=NetBSD
>>>         OSName=NetBSD
>>>         OSRelease=6.1
>>>         OSVersion="NetBSD 6.1 (CUSTOM) #0: Fri Sep 20 13:19:58 PDT 2013 
>>> phargrov@pcp-j-17:/home/phargrov/CUSTOM"
>>>         Architecture=i386
>>>         Backend=x86
>>>         Cpuset:  0x00000003
>>>         Online:  0x00000003
>>>         Allowed: 0x00000003
>>>         Bind CPU proc:   TRUE
>>>         Bind CPU thread: TRUE
>>>         Bind MEM proc:   FALSE
>>>         Bind MEM thread: FALSE
>>>         Type: PU Number of child objects: 0
>>>                 Name=NULL
>>>                 Cpuset:  0x00000001
>>>                 Online:  0x00000001
>>>                 Allowed: 0x00000001
>>>         Type: PU Number of child objects: 0
>>>                 Name=NULL
>>>                 Cpuset:  0x00000002
>>>                 Online:  0x00000002
>>>                 Allowed: 0x00000002
>>> --------------------------------------------------------------------------
>>> While computing bindings, we found no available cpus on
>>> the following node:
>>> 
>>>   Node:  pcp-j-17
>>> 
>>> Please check your allocation.
>>> --------------------------------------------------------------------------
>>> [pcp-j-17:02150] mca: base: close: component hnp closed
>>> [pcp-j-17:02150] mca: base: close: unloading component hnp
>>> 
>>> 
>>> 
>>> On Wed, Jan 8, 2014 at 8:50 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> Hmmm...looks to me like the code should protect against this - unless the 
>>> system isn't correctly reporting binding support. Could you run this with 
>>> "-mca ess_base_verbose 10"? This will output the topology we found, 
>>> including the binding support (which isn't in the usual output).
>>> 
>>> On Jan 8, 2014, at 8:14 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> Hmmm...I see the problem. Looks like binding isn't supported on that 
>>>> system for some reason, so we need to turn "off" our auto-binding when we 
>>>> hit that condition. I'll check to see why that isn't happening (was 
>>>> supposed to do so)
>>>> 
>>>> 
>>>> On Jan 8, 2014, at 3:43 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>> 
>>>>> While I have yet to get a working build on NetBSD for x86-64 h/w, I 
>>>>> *have* successfully built Open MPI's current 1.7.4rc tarball on NetBSD-6 
>>>>> for x86.  However, I can't *run* anything:
>>>>> 
>>>>> Attempting the ring_c example on 2 cores:
>>>>> -bash-4.2$ mpirun -mca btl sm,self -np 2 examples/ring_c
>>>>> --------------------------------------------------------------------------
>>>>> While computing bindings, we found no available cpus on
>>>>> the following node:
>>>>> 
>>>>>   Node:  pcp-j-17
>>>>> 
>>>>> Please check your allocation.
>>>>> --------------------------------------------------------------------------
>>>>> 
>>>>> The failure is the same w/o "-mca btl sm,self"
>>>>> Singleton runs fail just as the np=2 run did.
>>>>> 
>>>>> I've attached compressed output from "ompi_info --all".
>>>>> 
>>>>> Since this is probably an hwloc-related issue, I also build hwloc-1.7.2 
>>>>> from pristine sources.
>>>>> I have attached compressed output of lstopo which NOTABLY indicates a 
>>>>> failure to bind to both of the CPUs.
>>>>> 
>>>>> For now, an explicit "--bind-to none" is working for me.
>>>>> Please let me know what additional info may be required.
>>>>> 
>>>>> -Paul
>>>>> 
>>>>> -- 
>>>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>>> Future Technologies Group
>>>>> Computer and Data Sciences Department     Tel: +1-510-495-2352
>>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>> <ompi_info-netbsd-x86.txt.bz2><lstopo172-netbsd-x86.txt.bz2>_______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> 
>>> -- 
>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>> Future Technologies Group
>>> Computer and Data Sciences Department     Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to