Le 06/09/2012 12:19, Gabriele Fatigati a écrit : > I did't find any strange number in /proc/meminfo. > > I've noted that the program fails exactly > every 65479 hwloc_set_area_membind. So It sounds like some kernel > limit. You can check that also just one thread. > > Maybe never has not noted them because usually we bind a large amount > of contiguos memory few times, instead of small and non contiguos > pieces of memory many and many times.. :(
If you have root access, try (as root) watch -n 1 grep numa_policy /proc/slabinfo Put a sleep(10) in your program when set_area_membind() fails, and don't let your program exit before you can read the content of /proc/slabinfo. Brice > > 2012/9/6 Brice Goglin <brice.gog...@inria.fr > <mailto:brice.gog...@inria.fr>> > > Le 06/09/2012 10:44, Samuel Thibault a écrit : > > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit : > >> mbind hwloc_linux_set_area_membind() fails: > >> > >> Error from HWLOC mbind: Cannot allocate memory > > Ok. mbind is not really supposed to allocate much memory, but it > still > > does allocate some, to record the policy > > > >> // hwloc_obj_t obj = hwloc_get_obj_by_type(topology, > HWLOC_OBJ_NODE, tid); > >> hwloc_obj_t obj = hwloc_get_obj_by_type(topology, > HWLOC_OBJ_PU, tid); > >> hwloc_cpuset_t cpuset = hwloc_bitmap_dup(obj->cpuset); > >> hwloc_bitmap_singlify(cpuset); > >> hwloc_set_cpubind(topology, cpuset, HWLOC_CPUBIND_THREAD); > >> > >> for( i = chunk*tid; i < len; i+=PAGE_SIZE) { > >> // res = hwloc_set_area_membind_nodeset(topology, > &array[i], PAGE_SIZE, obj->nodeset, HWLOC_MEMBIND_BIND, > HWLOC_MEMBIND_THREAD); > >> res = hwloc_set_area_membind(topology, &array[i], > PAGE_SIZE, cpuset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD); > > and I'm afraid that calling set_area_membind for each page might > be too > > dense: the kernel is probably allocating a memory policy record > for each > > page, not being able to merge adjacent equal policies. > > > > It's supposed to merge VMA with same policies (from what I > understand in > the code), but I don't know if that actually works. > Maybe Gabriele found a kernel bug :) > > Brice > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org <mailto:hwloc-us...@open-mpi.org> > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > > > > -- > Ing. Gabriele Fatigati > > HPC specialist > > SuperComputing Applications and Innovation Department > > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy > > www.cineca.it <http://www.cineca.it> Tel: +39 051 > 6171722 > > g.fatigati [AT] cineca.it <http://cineca.it> > > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users