Hi, Good, you found the kernel limit that exceed.
proc/memfinfo reports as MemFree 47834588 kB numactl -H: available: 2 nodes (0-1) node 0 size: 24194 MB node 0 free: 22702 MB node 1 size: 24240 MB node 1 free: 23997 MB node distances: node 0 1 0: 10 21 1: 21 10 Are you able to reproduce the error using my attached code? Another question. I'm trying the same code in another system, but hwloc gives: "Function not implemented". Maybe because there isn't installed numa-devel package? Numa non devel package il alreay installed. Thanks. 2012/9/6 Brice Goglin <brice.gog...@inria.fr> > Le 06/09/2012 14:51, Gabriele Fatigati a écrit : > > Hi Brice, > > the initial grep is: > > numa_policy 65671 65952 24 144 1 : tunables 120 60 > 8 : slabdata 458 458 0 > > When set_membind fails is: > > numa_policy 482 1152 24 144 1 : tunables 120 60 > 8 : slabdata 8 8 288 > > What does it means? > > > The first number is the number of active objects. That means 65000 > mempolicy objects were in use on the first line. > (I wonder if you swapped the lines, I expected higher numbers at the end > of the run) > > Anyway, having 65000 mempolicies in use is a lot. And that would somehow > correspond to the number of set_area_membind that succeeed before one > fails. So the kernel might indeed fail to merge those. > > That said, these objects are small (24bytes here if I am reading things > correctly), so we're talking about 1,6MB only here. So there's still > something else eating all the memory. /proc/meminfo (MemFree) and numactl > -H should again help. > > > Brice > > > > > > 2012/9/6 Brice Goglin <brice.gog...@inria.fr> > >> Le 06/09/2012 12:19, Gabriele Fatigati a écrit : >> >> I did't find any strange number in /proc/meminfo. >> >> I've noted that the program fails exactly >> every 65479 hwloc_set_area_membind. So It sounds like some kernel limit. >> You can check that also just one thread. >> >> Maybe never has not noted them because usually we bind a large amount >> of contiguos memory few times, instead of small and non contiguos pieces of >> memory many and many times.. :( >> >> >> If you have root access, try (as root) >> watch -n 1 grep numa_policy /proc/slabinfo >> Put a sleep(10) in your program when set_area_membind() fails, and don't >> let your program exit before you can read the content of /proc/slabinfo. >> >> Brice >> >> >> >> >> >> 2012/9/6 Brice Goglin <brice.gog...@inria.fr> >> >>> Le 06/09/2012 10:44, Samuel Thibault a écrit : >>> > Gabriele Fatigati, le Thu 06 Sep 2012 10:12:38 +0200, a écrit : >>> >> mbind hwloc_linux_set_area_membind() fails: >>> >> >>> >> Error from HWLOC mbind: Cannot allocate memory >>> > Ok. mbind is not really supposed to allocate much memory, but it still >>> > does allocate some, to record the policy >>> > >>> >> // hwloc_obj_t obj = hwloc_get_obj_by_type(topology, >>> HWLOC_OBJ_NODE, tid); >>> >> hwloc_obj_t obj = hwloc_get_obj_by_type(topology, >>> HWLOC_OBJ_PU, tid); >>> >> hwloc_cpuset_t cpuset = hwloc_bitmap_dup(obj->cpuset); >>> >> hwloc_bitmap_singlify(cpuset); >>> >> hwloc_set_cpubind(topology, cpuset, HWLOC_CPUBIND_THREAD); >>> >> >>> >> for( i = chunk*tid; i < len; i+=PAGE_SIZE) { >>> >> // res = hwloc_set_area_membind_nodeset(topology, >>> &array[i], PAGE_SIZE, obj->nodeset, HWLOC_MEMBIND_BIND, >>> HWLOC_MEMBIND_THREAD); >>> >> res = hwloc_set_area_membind(topology, &array[i], >>> PAGE_SIZE, cpuset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_THREAD); >>> > and I'm afraid that calling set_area_membind for each page might be too >>> > dense: the kernel is probably allocating a memory policy record for >>> each >>> > page, not being able to merge adjacent equal policies. >>> > >>> >>> It's supposed to merge VMA with same policies (from what I understand in >>> the code), but I don't know if that actually works. >>> Maybe Gabriele found a kernel bug :) >>> >>> Brice >>> >>> _______________________________________________ >>> hwloc-users mailing list >>> hwloc-us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >>> >> >> >> >> -- >> Ing. Gabriele Fatigati >> >> HPC specialist >> >> SuperComputing Applications and Innovation Department >> >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >> >> www.cineca.it Tel: +39 051 >> 6171722<%2B39%20051%206171722> >> >> g.fatigati [AT] cineca.it >> >> >> _______________________________________________ >> hwloc-users mailing >> listhwloc-users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >> >> >> >> _______________________________________________ >> hwloc-users mailing list >> hwloc-us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >> > > > > -- > Ing. Gabriele Fatigati > > HPC specialist > > SuperComputing Applications and Innovation Department > > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy > > www.cineca.it Tel: +39 051 6171722 > > g.fatigati [AT] cineca.it > > > _______________________________________________ > hwloc-users mailing > listhwloc-users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > > > > _______________________________________________ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users > -- Ing. Gabriele Fatigati HPC specialist SuperComputing Applications and Innovation Department Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy www.cineca.it Tel: +39 051 6171722 g.fatigati [AT] cineca.it