> > The ACPI SLIT table (reported by numactl -H) was indeed often dumb or even > wrong. But SLIT wasn't widely used anyway, so vendors didn't care much > about putting valid info there, it didn't break anything in most > applications. Hopefully it won't be the case for HMAT because HMAT will be > the official way to figure out which target memory is fast or not. If > vendors don't fill it properly, the OS may use HBM or NVDIMMs by default > instead of DDR, which will likely cause more problems than a broken SLIT.
Right. Even now, SLIT values have an impact on the Linux scheduler. See this: https://www.codeblueprint.co.uk/2019/07/12/what-are-slit-tables.html "The current magic value used inside Linux kernel is 30 – if the NUMA node distance between two nodes is more than 30, the Linux kernel scheduler will try not to migrate tasks between them." https://github.com/torvalds/linux/blob/master/include/linux/topology.h#L60 There's an example at the end of the manpage of hwloc-annotate. It's very > similar to your line, but you likely need a capital to "Bandwidth". Yes, it works as expected when used with the capital "B" See [1]. I'll see if I can make things case-insensitive in the tools (not in the C > API). Yes, it would be a nice improvement. Currently, there is a mismatch between different commands. hwloc-info supports both bandwidth and Bandwidth, but hwloc-annotate requires a capital letter. hwloc-info --best-memattr bandwidth hwloc-info --best-memattr Bandwidth hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:0 18 && mv out.xml in.xml Merci beaucoup! Jirka [1] lstopo in.xml hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:0 18 && mv out.xml in.xml hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:1 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:2 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:3 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:1 memattr Bandwidth node:0 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:1 memattr Bandwidth node:1 18 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:1 memattr Bandwidth node:2 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:1 memattr Bandwidth node:3 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:2 memattr Bandwidth node:0 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:2 memattr Bandwidth node:1 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:2 memattr Bandwidth node:2 18 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:2 memattr Bandwidth node:3 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:3 memattr Bandwidth node:0 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:3 memattr Bandwidth node:1 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:3 memattr Bandwidth node:2 9 && mv -f out.xml in.xml hwloc-annotate in.xml out.xml node:3 memattr Bandwidth node:3 18 && mv -f out.xml in.xml $ lstopo-no-graphics --input in.xml --memattrs Memory attribute #0 name `Capacity' flags 1 NUMANode L#0 = 16469168128 NUMANode L#1 = 16908922880 NUMANode L#2 = 16881680384 NUMANode L#3 = 16908451840 Memory attribute #1 name `Locality' flags 2 NUMANode L#0 = 8 NUMANode L#1 = 8 NUMANode L#2 = 8 NUMANode L#3 = 8 Memory attribute #2 name `Bandwidth' flags 5 NUMANode L#0 = 18 (NUMANode L#0) NUMANode L#0 = 9 (NUMANode L#1) NUMANode L#0 = 9 (NUMANode L#2) NUMANode L#0 = 9 (NUMANode L#3) NUMANode L#1 = 9 (NUMANode L#0) NUMANode L#1 = 18 (NUMANode L#1) NUMANode L#1 = 9 (NUMANode L#2) NUMANode L#1 = 9 (NUMANode L#3) NUMANode L#2 = 9 (NUMANode L#0) NUMANode L#2 = 9 (NUMANode L#1) NUMANode L#2 = 18 (NUMANode L#2) NUMANode L#2 = 9 (NUMANode L#3) NUMANode L#3 = 9 (NUMANode L#0) NUMANode L#3 = 9 (NUMANode L#1) NUMANode L#3 = 9 (NUMANode L#2) NUMANode L#3 = 18 (NUMANode L#3) Memory attribute #3 name `Latency' flags 6 On Fri, Oct 2, 2020 at 12:43 AM Brice Goglin <brice.gog...@inria.fr> wrote: > Le 01/10/2020 à 22:17, Jirka Hladky a écrit : > > > This is interesting! ACPI tables are often wrong - having the option to > annotate more accurate data to the hwloc is great. > > > The ACPI SLIT table (reported by numactl -H) was indeed often dumb or even > wrong. But SLIT wasn't widely used anyway, so vendors didn't care much > about putting valid info there, it didn't break anything in most > applications. Hopefully it won't be the case for HMAT because HMAT will be > the official way to figure out which target memory is fast or not. If > vendors don't fill it properly, the OS may use HBM or NVDIMMs by default > instead of DDR, which will likely cause more problems than a broken SLIT. > > > We have a simple C program to measure the bandwidth between NUMA nodes, > producing a table similar to the output of numactl -H (but with values in > GB/s). > > node 0 1 2 3 > 0: 10 16 16 16 > 1: 16 10 16 16 > 2: 16 16 10 16 > 3: 16 16 16 10 > > I was trying to annotate it using hwloc-annotate, but I have not > succeeded. : > > lstopo in.xml > hwloc-annotate in.xml out.xml node:0 memattr bandwidth node:0 18 > Failed to find memattr by name bandwidth > > Is there some example of how to do this? > > > There's an example at the end of the manpage of hwloc-annotate. It's very > similar to your line, but you likely need a capital to "Bandwidth". I'll > see if I can make things case-insensitive in the tools (not in the C API). > > > > Also, are there any plans for having a tool, which would measure the > memory bandwidth and annotate the results to XML for later usage with hwloc > commands? > > > We've been talking about this for years. Having a good performance > measurement tool isn't easy. I see people sending patches for adding some > assembly because this corner case on this processor isn't well optimized by > GCC :/ I am not sure we want to put this inside hwloc. > > Brice > > > > On Thu, Oct 1, 2020 at 7:28 PM Brice Goglin <brice.gog...@inria.fr> wrote: > >> >> Le 01/10/2020 à 19:16, Jirka Hladky a écrit : >> >> Hi Brice, >> >> this new feature sounds very interesting! >> >> Add hwloc/memattrs.h for exposing latency/bandwidth information >>> between initiators (CPU sets for now) and target NUMA nodes, >>> typically on heterogeneous platforms. >> >> >> If I get it right, I need to have an ACPI HMAT table on the system to use >> the new functionality, right? >> >> >> Hello Jirka >> >> It's also possible to add memory attribute using the C API or with >> hwloc-annotate to modify a XML (you may create attribute, or add values for >> a given attribute). >> >> >> I have tried following on Fedora >> acpidump -o acpidump.bin >> acpixtract -a acpidump.bin >> >> but there is no HMAT table reported. So it seems I'm out of luck, and I >> cannot test the new functionality, right? >> >> >> Besides KNL (which is too old to have HMAT, but hwloc now provides >> hardwired bandwidth/latency values), the only platforms with heterogeneous >> memories right now are Intel machines with Optane DCPMM (NVDIMMs). Some >> have a HMAT, some don't. If your machine doesn't, it's possible to provide >> a custom HMAT table in the initrd. That's not easy, so adding attribute >> values with hwloc-annotate might be easier. >> >> >> >> Also, where can we find the list of attributes supported >> by --best-memattr? >> --best-memattr <attr> Only display the best target among the local nodes >> >> >> There are 4 standard attributes defined in hwloc/memattrs.h: capacity, >> locality, latency and bandwidth.They are also visible in lstopo -vv or >> lstopo --memattrs. I'll something in the doc. >> >> >> >> By trial and error, I have found out that latency and bandwidth are >> supported. Are there any other? Could you please add the list to hwloc-info >> -h? >> >> >> I could add the default ones, but I'll need to specify that additional >> user-given attributes may exist. >> >> Thanks for the feedback. >> >> Brice >> >> >> >> >> hwloc-info --best-memattr bandwidth >> hwloc-info --best-memattr latency >> >> Thanks a lot! >> Jirka >> >> >> On Thu, Oct 1, 2020 at 12:45 AM Brice Goglin <brice.gog...@inria.fr> >> wrote: >> >>> hwloc (Hardware Locality) 2.3.0 is now available for download. >>> >>> https://www.open-mpi.org/software/hwloc/v2.3/ >>> <https://www.open-mpi.org/software/hwloc/v2.0/> >>> >>> v2.3.0 brings quite a lot of changes. The biggest one is the addition >>> of the memory attribute API to expose hardware information that vendors >>> are (slowly) adding to ACPI tables to describe heterogeneous memory >>> platforms (mostly DDR+NVDIMMs right now). >>> >>> The following is a summary of the changes since v2.2.0. >>> >>> Version 2.3.0 >>> ------------- >>> * API >>> + Add hwloc/memattrs.h for exposing latency/bandwidth information >>> between initiators (CPU sets for now) and target NUMA nodes, >>> typically on heterogeneous platforms. >>> - When available, bandwidths and latencies are read from the ACPI HMAT >>> table exposed by Linux kernel 5.2+. >>> - Attributes may also be customized to expose user-defined performance >>> information. >>> + Add hwloc_get_local_numanode_objs() for listing NUMA nodes that are >>> local to some locality. >>> + The new topology flag HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT causes >>> support arrays to be loaded from XML exported with hwloc 2.3+. >>> - hwloc_topology_get_support() now returns an additional "misc" >>> array with feature "imported_support" set when support was imported. >>> + Add hwloc_topology_refresh() to refresh internal caches after modifying >>> the topology and before consulting the topology in a multithread >>> context. >>> * Backends >>> + Add a ROCm SMI backend and a hwloc/rsmi.h helper file for getting >>> the locality of AMD GPUs, now exposed as "rsmi" OS devices. >>> Thanks to Mike Li. >>> + Remove POWER device-tree-based topology on Linux, >>> (it was disabled by default since 2.1). >>> * Tools >>> + Command-line options for specifying flags now understand comma-separated >>> lists of flag names (substrings). >>> + hwloc-info and hwloc-calc have new --local-memory --local-memory-flags >>> and --best-memattr options for reporting local memory nodes and >>> filtering >>> by memory attributes. >>> + hwloc-bind has a new --best-memattr option for filtering by memory >>> attributes >>> among the memory binding set. >>> + Tools that have a --restrict option may now receive a nodeset or >>> some custom flags for restricting the topology. >>> + lstopo now has a --thickness option for changing line thickness in the >>> graphical output. >>> + Fix lstopo drawing when autoresizing on Windows 10. >>> + Pressing the F5 key in lstopo X11 and Windows graphical/interactive >>> outputs >>> now refreshes the display according to the current topology and binding. >>> + Add a tikz lstopo graphical backend to generate picture easily included >>> into >>> LaTeX documents. Thanks to Clement Foyer. >>> * Misc >>> + The default installation path of the Bash completion file has changed to >>> ${datadir}/bash-completion/completions/hwloc. Thanks to Tomasz Kłoczko. >>> >>> >>> Changes since 2.3.0rc1 are negligible. >>> -- >>> Brice >>> >>> >>> _______________________________________________ >>> hwloc-announce mailing list >>> hwloc-annou...@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/hwloc-announce >> >> >> >> -- >> -Jirka >> >> _______________________________________________ >> hwloc-users mailing >> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users >> >> _______________________________________________ >> hwloc-users mailing list >> hwloc-users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/hwloc-users > > > > -- > -Jirka > > _______________________________________________ > hwloc-users mailing > listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users > > _______________________________________________ > hwloc-users mailing list > hwloc-users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-users -- -Jirka
_______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users