Thanks, I copied useful information from this thread and some links to
    https://github.com/open-mpi/hwloc/issues/143

However, not sure I'll have time to look at this in the near future :/

Brice




Le 07/01/2016 09:03, Matthias Reich a écrit :
> Hello,
>
> To check whether kstat is able to report the psrset definitions, I
> defined a set consisting of 2 CPUs (psrset -c 1-2) CPU1 and CPU2. The
> remaining CPUs (CPU0, CPU2..CPU23) were left undefined.
>
> On the machine, we can execute the "kstat" command and receive (among
> 1000s of lines) the following info:
>
> module: unix                            instance: 0
> name:   pset                            class:    misc
>         avenrun_15min                   70
>         avenrun_1min                    53
>         avenrun_5min                    47
>         crtime                          0
>         ncpus                           22
>         runnable                        1146912
>         snaptime                        80083.491239257
>         updates                         790784
>         waiting                         0
>
>
> module: unix                            instance: 1
> name:   pset                            class:    misc
>         avenrun_15min                   0
>         avenrun_1min                    0
>         avenrun_5min                    0
>         crtime                          79983.070416351
>         ncpus                           2
>         runnable                        0
>         snaptime                        80083.595839172
>         updates                         1005
>         waiting                         0
>
> which is not very comprehensive and doesn't even tell, which CPUs are
> part of the particular set, but could probably be used to at least warn
> about the existence of a CPU set and prevent the (not very intuitive)
> error message and consequent abort.
>
> However, doing the same on the machine without the pset defined, we get:
>
> module: unix                            instance: 0
> name:   pset                            class:    misc
>         avenrun_15min                   50
>         avenrun_1min                    38
>         avenrun_5min                    41
>         crtime                          0
>         ncpus                           24
>         runnable                        1163866
>         snaptime                        81105.346688035
>         updates                         801003
>         waiting                         0
>
> so the (only) processor set encompasses all 24 (virtual) cores. This
> may be the key to check for.
>
> The C-API to check for processor set(s) is available through the
> libpool library, which allows more resource pool configuration than
> just processor sets, but can probably act as an abstraction layer for
> different Solaris flavors...
>
> Matthias
>
>>  Hello
>> So processor sets are not taken into account when Solaris reports
>> topology information in kstat etc.
>> Do you know if hwloc can query processor sets from the C interface?
>> If so, we could apply the processor set mask to hwloc object cpusets
>> during discovery to avoid your error.
>> Brice
>>
>> Le 05/01/2016 10:18, Karl Behler a écrit :
>>> There was a processor set defined (command psrset) on this machine.
>>> Having removed the psrset hwloc-info produces a result without error
>>> messages:
>>>
>>> hwloc-info -v
>>> depth 0:        1 Machine (type #1)
>>>  depth 1:       2 NUMANode (type #2)
>>>   depth 2:      2 Package (type #3)
>>>    depth 3:     12 Core (type #5)
>>>     depth 4:    24 PU (type #6)
>>>
>>> It seems the concept of defining a psrset is in contradiction to what
>>> hwloc and/or openmpi expects/allows.
>>>
>>>
>>> On 04.01.16 18:16, Karl Behler wrote:
>>>> We used to run our MPI application with the SUNWhpc implementation
>>>> from Sun/Oracle. (This was derived from openmpi 1.5.)
>>>> However, the Oracle HPC implementation fails for the new Solaris 11.3
>>>> platform.
>>>> So we downloaded and made openmpi 1.10.1 on this platform from
>>>> scratch.
>>>>
>>>> All seems fine and a simple test application runs fine.
>>>> However, with the real application we are running into a hwloc
>>>> problem.
>>>>
>>>> So we also downloaded and made the hwloc package 1.11.2.
>>>>
>>>> Now examining hardware locality we get the following error:
>>>>
>>>> hwloc-info -v --whole-io
>>>> ****************************************************************************
>>>>
>>>>
>>>> * hwloc 1.11.2 has encountered what looks like an error from the
>>>> operating system.
>>>> *
>>>> * Core (P#0 cpuset 0x00001001) intersects with NUMANode (P#1 cpuset
>>>> 0x0003c001) without inclusion!
>>>> * Error occurred in topology.c line 1046
>>>> *
>>>> * The following FAQ entry in the hwloc documentation may help:
>>>> *   What should I do when hwloc reports "operating system" warnings?
>>>> * Otherwise please report this error message to the hwloc user's
>>>> mailing list,
>>>> * along with any relevant topology information from your platform.
>>>> ****************************************************************************
>>>>
>>>>
>>>> depth 0:        1 Machine (type #1)
>>>>  depth 1:       2 Package (type #3)
>>>>   depth 2:      2 NUMANode (type #2)
>>>>    depth 3:     1 Core (type #5)
>>>>     depth 4:    24 PU (type #6)
>>>>
>>>> Since I could not find the mentioned FAQ topic I'm asking the list
>>>> for advice.
>>>>
>>>> Our system is an Oracle/ Solaris 11.3 (latest patch level) on an
>>>> Intel hardware platform from Sun.
>>>>
>>>> output of uname -a -> SunOS sxaug28 5.11 11.3 i86pc i386 i86pc
>>>> output of psrinfo -v ->
>>>>
>>>> Status of virtual processor 0 as of: 01/04/2016 17:10:17
>>>>   on-line since 01/04/2016 14:44:28.
>>>>   The i386 processor operates at 1600 MHz,
>>>>         and has an i387 compatible floating point processor.
>>>> Status of virtual processor 1 as of: 01/04/2016 17:10:17
>>>>   on-line since 01/04/2016 14:45:10.
>>>>   The i386 processor operates at 1600 MHz,
>>>>         and has an i387 compatible floating point processor.
>>>> .
>>>> . (similar lines removed)
>>>> .
>>>> Status of virtual processor 23 as of: 01/04/2016 17:10:17
>>>>   on-line since 01/04/2016 14:45:11.
>>>>   The i386 processor operates at 1600 MHz,
>>>>         and has an i387 compatible floating point processor.
>>>>
>>>> Following comes the script which was used to make hwloc: (used
>>>> compiler: Sunstudio 12.4, see config.log as bz2 attachment)
>>>>
>>>> setenv CFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch
>>>> -xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all -xO5"
>>>> setenv CXXFLAGS "$CFLAGS"
>>>> setenv FCFLAGS "-m64 -xtarget=generic -xarch=sse2 -xprefetch
>>>> -xprefetch_level=2 -xvector=simd -stackvar -xO5"
>>>> setenv FFLAGS "$FCFLAGS"
>>>> setenv PREFIX /usr/openmpi/hwloc-1.11.2
>>>> ./configure --prefix=$PREFIX --disable-debug
>>>> dmake -j 12
>>>> # as root: make install
>>>> #        : cp -p config.status $PREFIX/config.status
>>>>
>>>> Any advice much appreciated.
>>>>
>>>> Karl
>>>>
>>>>
>>>> _______________________________________________
>>>> hwloc-users mailing list
>>>> hwloc-users_at_[hidden]
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>>> Searchable archives:
>>>> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php
>>>
>>>
>>> -- 
>>> Dr. Karl Behler   
>>> CODAC & IT services ASDEX Upgrade
>>> phon +49 89 3299-1351 fax 3299-961351
>>>
>>>
>>>
>>> _______________________________________________
>>> hwloc-users mailing list
>>> hwloc-users_at_[hidden]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1236.php
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
> Link to this post:
> http://www.open-mpi.org/community/lists/hwloc-users/2016/01/1240.php

Reply via email to