Re: [hwloc-users] question about hwloc_set_area_membind_nodeset

Brice Goglin Sat, 11 Nov 2017 20:20:53 -0800


Le 12/11/2017 00:14, Biddiscombe, John A. a écrit :
> I'm allocating some large matrices, from 10k squared elements up to
> 40k squared per node.
> I'm also using membind to place pages of the matrix memory across numa
> nodes so that the matrix might be bound according to the kind of
> pattern at the end of this email - where each 1 or 0 corresponds to a
> 256x256 block of memory.
>
> The way I'm doing this is by calling hwloc_set_area_membind_nodeset
> many thousands of times after allocation, and I've found that as the
> matrices get bigger, then after some N calls to area_membind then I
> get a failure and it returns -1 (errno does not seem to be set to
> either ENOSYS or EXDEV) - but strerror report "Cannot allocate memory".
>
> Question 1 : by calling area_setmembind too many times, am I causing
> some resource usage in the memory tables that is being exhausted.
>


Hello

That's likely what's happening. Each set_area() may be creating a new
"virtual memory area". The kernel tries to merge them with neighbors if
they go to the same NUMA node. Otherwise it creates a new VMA. I can't
find the exact limit but it's something like 64k so I guess you're
exhausting that.

> Question 2 : Is there a better way of achieving the result I'm looking
> for (such as a call to membind with a stride of some kind to say put N
> pages in a row on each domain in alternation).

Unfortunately, the interleave policy doesn't have a stride argument.
It's one page on node 0, one page on node 1, etc.

The only idea I have is to use the first-touch policy: Make sure your
buffer isn't is physical memory yet, and have a thread on node 0 read
the "0" pages, and another thread on node 1 read the "1" page.

Brice


>
> Many thanks
>
> JB
>
>
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 0000000000000000111111111111111100000000
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> 1111111111111111000000000000000011111111
> ... etc
>
>
>
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users

_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] question about hwloc_set_area_membind_nodeset

Reply via email to