If I remember correctly, Solaris won't let you bind to random sets of
PUs. It can bind to single PUs, sets of NUMA nodes, or an entire
machine, or something like this.

hwloc-bind has a --strict option (that sets HWLOC_CPUBIND_STRICT). Maybe
that needs to be improved.

Brice




Le 12/09/2012 16:16, Jeff Squyres a écrit :
> Brice / Samuel --
>
> How well does hwloc work for process binding on Solaris?  This is not 
> something I've followed closely (note that Terry Dontje has moved on to other 
> projects inside Oracle, so he's no longer my go-to guy for All Things 
> Solaris...).
>
> Siegmar Gross (CC'ed) originally had a binding problem in Open MPI, but we've 
> narrowed it down to some simple binding tests with hwloc, just to avoid all 
> the OMPI complications.  
>
> I've asked him to run hwloc-bind on a few different configurations, and run 
> my report-bindings.sh script (see below) so that it reports where it was 
> actually bound.  He seems to get an hwloc error any time he tries to bind to 
> more than 1 PU.  Is that expected on Solaris?
>
> Sidenote: if hwloc-bind fails to bind, should we still launch the child 
> process?
>
> Here's my trivial report-bindings.sh script:
>
> -----
> #!/bin/sh
>
> bitmap=`hwloc-bind --get -p`
> friendly=`hwloc-calc -p -H socket.core.pu $bitmap`
>
> echo "MCW rank $OMPI_COMM_WORLD_RANK (`hostname`): $friendly"
> exit 0
> ------
>
> See Seigmar's detailed reply, below.
>
>
>
> On Sep 11, 2012, at 8:22 AM, Siegmar Gross wrote:
>
>> Hi,
>>
>> I have purged the old stuff in the mail.
>>
>>> It's concerning that you cannot bind to a full core (i.e., all
>>> the pu in a core). Does Solaris not allow you to bind to multiple
>>> pu's in a single process?
>> Unfortunately I don't know because I haven't used it up to now.
>> "mpstat" sees all hardware threads as cpu's.
>>
>> rs0 fd1026 104 mpstat
>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
>>  0    1   0   16   224    8   36    0    0    1    0   127    0   0   0 100
>>  1    1   0   38    69   40   38    0    0    1    0   146    0   0   0 100
>>  2    2   0   18    57   28   41    0    0    1    0   169    0   0   0 100
>>  3    1   0   14    40   11   40    0    0    1    0   152    0   0   0 100
>>  4    1   0   13    41   11   41    0    0    1    0   149    0   0   0 100
>>  5    1   0   17    43   12   42    0    0    1    0   178    0   0   0 100
>>  6    2   0   15    43   11   44    0    0    1    0   171    0   0   0 100
>>  7    1   0   14    42   11   41    0    0    1    0   156    0   0   0 100
>>  8    1   0   10    34    9   32    0    0    0    0    46    0   0   0 100
>>  9    1   0   11    34    9   32    0    0    1    0    82    0   0   0 100
>> 10    1   0   10    32    8   30    0    0    1    0    55    0   0   0 100
>> 11    0   0   10    31    8   29    0    0    0    0    51    0   0   0 100
>> 12    0   0    9    30    8   28    0    0    0    0    46    0   0   0 100
>> 13    1   0   11    29    7   27    0    0    0    0    59    0   0   0 100
>> 14    1   0   11    33    8   29    0    0    1    0    68    0   0   0 100
>> 15    0   0   11    29    7   26    0    0    0    0    48    0   0   0 100
>>
>>
>> I found the following addresses which state that it is possible
>> to bind a process to a processor set.
>>
>> http://developers.sun.com/solaris/articles/solaris_processor.html
>> http://stackoverflow.com/questions/10277221/binding-process-to-multiple-processors-on-sun
>> -solaris-os
>>
>>
>>> Please repeat the hwloc-bind tests for both 1.3 and 1.5, but run
>>> the report bindings script instead of date. That will show where
>>> the child process was actually bound. 
>>
>> ssh rs0
>> cd hwloc
>> set path = ( `pwd`/hwloc-1.3.2/bin $path )
>> setenv LD_LIBRARY_PATH_32 `pwd`/hwloc-1.3.2/lib:${LD_LIBRARY_PATH_32}
>>
>>
>> I always get "errno 18 Cross-device link" if I use
>> "socket:*.core:*". No diference between "-l" and "-p". I
>> don't see differences in the output but I can provide the
>> output for all 16 hardware threads with both "-l" and "-p"
>> if you need it.
>>
>> rs0 hwloc 107 which hwloc-bind
>> /home/fd1026/hwloc/hwloc-1.3.2/bin/hwloc-bind
>>
>> rs0 hwloc 108 hwloc-bind socket:0.core:0 -l report-bindings.sh
>> hwloc_set_cpubind 0x00000003 failed (errno 18 Cross-device link)
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> rs0 hwloc 114 hwloc-bind socket:0.core:0 -p report-bindings.sh
>> hwloc_set_cpubind 0x00000003 failed (errno 18 Cross-device link)
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> rs0 hwloc 118 hwloc-bind socket:1.core:3 -l report-bindings.sh
>> hwloc_set_cpubind 0x0000c000 failed (errno 18 Cross-device link)
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> rs0 hwloc 119 hwloc-bind socket:1.core:3 -p report-bindings.sh
>> hwloc_set_cpubind 0x0000c000 failed (errno 18 Cross-device link)
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>>
>> I get no error if I use "pu:*" but I don't see a difference in the
>> output. For me the output looks always the same independent of
>> "pu:0", ..., "pu:15".
>>
>> rs0 hwloc 120 hwloc-bind pu:0 -l report-bindings.sh
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> rs0 hwloc 121 hwloc-bind pu:0 -p report-bindings.sh
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>>
>> Now the same things for hwloc-1.5:
>>
>> rs0 hwloc 106 which hwloc-bind
>> /usr/local/bin/hwloc-bind
>>
>> rs0 hwloc 107 hwloc-bind socket:0.core:0 -l report-bindings.sh
>> hwloc_set_cpubind 0x00000003 failed (errno 18 Cross-device link)
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> rs0 hwloc 108 hwloc-bind socket:0.core:0 -p report-bindings.sh
>> hwloc_set_cpubind 0x00000003 failed (errno 18 Cross-device link)
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> rs0 hwloc 109 hwloc-bind socket:1.core:3 -l report-bindings.sh
>> hwloc_set_cpubind 0x0000c000 failed (errno 18 Cross-device link)
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> rs0 hwloc 110 hwloc-bind socket:1.core:3 -p report-bindings.sh
>> hwloc_set_cpubind 0x0000c000 failed (errno 18 Cross-device link)
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>>
>> rs0 hwloc 112 hwloc-bind pu:0 -l report-bindings.sh
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> rs0 hwloc 113 hwloc-bind pu:0 -p report-bindings.sh
>> MCW rank  (rs0.informatik.hs-fulda.de): Socket:1024.Core:0.PU:0 
>> Socket:1024.Core:0.PU:1 
>> Socket:1024.Core:2.PU:2 Socket:1024.Core:2.PU:3 Socket:1024.Core:4.PU:4 
>> Socket:1024.Core:4.PU:5 Socket:1024.Core:6.PU:6 Socket:1024.Core:6.PU:7 
>> Socket:1032.Core:8.PU:8 Socket:1032.Core:8.PU:9 Socket:1032.Core:10.PU:10 
>> Socket:1032.Core:10.PU:11 Socket:1032.Core:12.PU:12 
>> Socket:1032.Core:12.PU:13 
>> Socket:1032.Core:14.PU:14 Socket:1032.Core:14.PU:15
>>
>> Is the above output helpful? Thank you very much for your help in advance.
>> Do you know a C++ application which I can try to test our compiler?
>>
>>
>> Kind regards
>>
>> Siegmar
>>
>>
>> ##########################################################################
>> #                                                                        #
>> # Hochschule Fulda          University of Applied Sciences               #
>> # FB Angewandte Informatik  Department of Applied Computer Science       #
>> #                                                                        #
>> # Prof. Dr. Siegmar Gross   Tel.:   +49 (0)661 9640 - 333                #
>> #                           Fax:    +49 (0)661 9640 - 349                #
>> # Marquardstr. 35           WWW:    http://www.hs-fulda.de/~gross        #
>> #                           E-Mail: siegmar.gr...@informatik.hs-fulda.de #
>> # D-36039 Fulda                                                          #
>> #                                                                        #
>> ##########################################################################
>>
>

Reply via email to