Dear Brice, Thank you for your valuable explanation. I have aligned with system page size and it worked. I am trying to get it working fine in OpenMP thread level and hope it will work fine. In case of any confusion I will get back you.
Sincerely appreciate your help. - Raju On Tue, Apr 26, 2016 at 2:19 AM, Brice Goglin <brice.gog...@inria.fr> wrote: > Hello > I won't have time to debug this in details because the code is quite > complex and it doesn't actually build on my machines. My feeling is that > you should first remove OpenMP entirely to avoid any issue with shared > variables that should be private instead. > I guess you will increase array_size significantly since the current value > (60) allocates a single page (which means each individual membind will > bind/move the entire page, and the last membind will win). > Also beware that dividing by 3 could create non-page-aligned buffers, > causing boundary pages to be bound twice (hence the last membind will win > too). > Brice > > > > > Le 25/04/2016 19:41, Rezaul Karim Raju a écrit : > > Hi Brice, > > Thank you very much. I would like to attach the c code I am running with. > As I mentioned you before my intention is to bind an allocated array over > NUMA nodes which I am aiming to use for locality in thread level execution. > > Please find the attached code file and output file. > > Findings is: > 1. binding retrieval always display last node binding. > 2. Is is possible to distribute array partially(with hwloc binding) to > specific node where I can exploit locality in openMP thread level > execution..? > 3. Am I not right guess on binding policies ..? > > > Do appreciate your comments much. > > - Raju > > On Mon, Apr 25, 2016 at 12:16 AM, Brice Goglin <brice.gog...@inria.fr> > wrote: > >> Please replace err with errno in that line: >> printf("Error Occured, and error no:= %d \n", err); >> >> You may need to #include <errno.h> in the header. >> >> Brice >> >> >> >> >> >> Le 25/04/2016 00:27, Rezaul Karim Raju a écrit : >> >> Please find the attached, system Layout. >> *uname -a* >> Linux crill-010 3.11.10-21-desktop #1 SMP PREEMPT Mon Jul 21 15:28:46 UTC >> 2014 (9a9565d) x86_64 x86_64 x86_64 GNU/Linux >> >> and below is the code snippet where I am getting error: >> >> /* Find Location of a: 3rd QUARTER */ >> * err = hwloc_get_area_membind_nodeset(topology, array+ size/2, size/4, >> nodeset_c, &policy, HWLOC_MEMBIND_THREAD ); * >> * if (err < 0) {* >> * printf("Error Occured, and error no:= %d \n", err);* >> fprintf(stderr, "failed to retrieve the buffer binding and policy\n"); >> hwloc_topology_destroy(topology); >> hwloc_bitmap_free(nodeset_c); >> //return EXIT_FAILURE; >> } >> >> *Please ignore the segfault, here it gives the error no: = -1* >> >> *My question is allocate an array to a NUMA node and bind it over nodes >> partially is OK with hwloc API..?* >> >> Thank you again. >> - Raju >> >> >> >> On Sun, Apr 24, 2016 at 4:58 PM, Brice Goglin < <brice.gog...@inria.fr> >> brice.gog...@inria.fr> wrote: >> >>> Please find out which line is actually causing the segfault. >>> Run your program under gdb. Once it crashes, type "bt full" and report >>> the output here. >>> >>> By the way, what kind of machine are you using? (lstopo + uname -a) >>> >>> Brice >>> >>> >>> >>> >>> Le 24/04/2016 23:46, Rezaul Karim Raju a écrit : >>> >>> Hi Brice, >>> >>> Thank you very much for your prompt care. >>> >>> I am retrieving as below: >>> >>> nodeset_c = hwloc_bitmap_alloc(); >>> >>> */* Find Location of a: 3rd QUARTER */* >>> err = *hwloc_get_area_membind_nodeset(*topology, *array+ size/2, >>> size/4,* nodeset_c, &policy, HWLOC_MEMBIND_THREAD ); >>> >>> /* print the corresponding NUMA nodes */ >>> hwloc_bitmap_asprintf(&s, nodeset_c); >>> printf("Address:= %p Variable:= <array [A]- 3rd quarter> bound to* >>> nodeset %s with contains:*\n", (array+size/2), s); >>> free(s); >>> hwloc_bitmap_foreach_begin(hw_i, nodeset_c) { >>> *obj_c = hwloc_get_numanode_obj_by_os_index(topology, hw_i);* >>> * printf("[3rd Q] node #%u (OS index %u) with %lld bytes of memory\n", >>> obj_c->logical_index, hw_i, (unsigned long long) >>> obj_c->memory.local_memory)*; >>> } hwloc_bitmap_foreach_end(); >>> hwloc_bitmap_free(nodeset_c); >>> >>> *It prints as below:* >>> >>> >>> *error no:= -1 and segmentation fault * >>> *my array size is = 262144 {data type long} and each Quarter = size/4 >>> =65536* >>> Address of array:= 0x7f350e515000, tmp:= 0x7f34fe515000, tst_array:= >>> 0x7f34ee515000 >>> Address of array:= 0x7f350e515000, array+size/4:= 0x7f352e515000, >>> array+size/2:= 0x7f354e515000, array+3*size/4:= 0x7f356e515000 >>> >>> Address:= 0x7f350e515000 Variable:= <array [A] - 1st quarter> bound to >>> nodeset 0x00000001 with contains: >>> [1st Q] node #0 (OS index 0) with 8387047424 bytes of memory >>> Address:= 0x7f352e515000 Variable:= <array [A]- 2nd quarter> bound to >>> nodeset 0x00000004 with contains: >>> [2nd Q] node #2 (OS index 2) with 8471621632 bytes of memory >>> >>> in case of [3rd Q] >>> Error Occured, and error no:= -1 and segmentation fault happened. >>> >>> Thanks.! >>> >>> >>> On Sun, Apr 24, 2016 at 4:08 PM, Brice Goglin < <brice.gog...@inria.fr> >>> brice.gog...@inria.fr> wrote: >>> >>>> Hello, >>>> What do you mean with " it can not bind the specified memory section >>>> (addr, len) to the desired NUMA node"? >>>> Did it fail? If so, what does errno contain? >>>> If it didn't fail, what did it do instead? >>>> thanks >>>> Brice >>>> >>>> >>>> >>>> >>>> Le 24/04/2016 23:02, Rezaul Karim Raju a écrit : >>>> >>>> Hi ... >>>> >>>> I was trying to bind each quarter of an array to 4 different NUMA >>>> nodes, and doing as below: >>>> >>>> *//ALLOCATION * >>>> *obj_a = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, 0);* >>>> >>>> *array =* hwloc_alloc_membind_nodeset( topology, size, obj_a->nodeset, >>>> HWLOC_MEMBIND_BIND, 1); >>>> *tmp *= hwloc_alloc_membind_nodeset( topology, size, obj_a->nodeset, >>>> HWLOC_MEMBIND_BIND, 1); >>>> >>>> *// DISTRIBUTED BINDING [my system has 8 NUMA nodes (0-7)]* >>>> printf("Address of array:= %p, array+size/4:= %p, array+size/2:= %p, >>>> array+3*size/4:= %p \n", array, array+size/4, array+size/2, >>>> array+3*size/4); >>>> // bind 1st quarter to node (n-1) >>>> hwloc_set_area_membind_nodeset(topology, (array), size/4, >>>> obj_a->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_MIGRATE); >>>> hwloc_set_area_membind_nodeset(topology, (tmp), size/4, obj_a->nodeset, >>>> HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_MIGRATE); >>>> // bind 2nd quarter to node (2) >>>> *obj_b = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, 2);* >>>> hwloc_set_area_membind_nodeset(topology, (array+size/4), size/4, >>>> obj_b->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_MIGRATE); >>>> hwloc_set_area_membind_nodeset(topology, (tmp +size/4), size/4, >>>> obj_b->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_MIGRATE); >>>> >>>> // bind 3rd quarter to node (4) >>>> * obj_c = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, 4);* >>>> hwloc_set_area_membind_nodeset(topology, array+size/2, size/4, >>>> obj_c->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_MIGRATE); >>>> hwloc_set_area_membind_nodeset(topology, tmp+size/2, size/4, >>>> obj_c->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_MIGRATE); >>>> // bind 4th quarter to node (6) >>>> * obj_d = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, 6);* >>>> hwloc_set_area_membind_nodeset(topology, array+3*size/4, size/4, >>>> obj_d->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_MIGRATE); >>>> hwloc_set_area_membind_nodeset(topology, tmp+3*size/4, size/4, >>>> obj_d->nodeset, HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_MIGRATE); >>>> >>>> >>>> My intention here is to distribute 'array' (which is - long type >>>> element: >>>> array = (ELM *) malloc(bots_arg_size * sizeof(ELM)); >>>> tmp = (ELM *) malloc(bots_arg_size * sizeof(ELM));) over nodes through >>>> hwloc memory binding. >>>> >>>> 1). But except only *obj_a, it can not bind the specified memory >>>> section (addr, len) to the desired NUMA node. * >>>> 2). I did tried with MEMBIND_INTERLEAVE policy >>>> array = hwloc_alloc_membind_nodeset(topology, size, cset_available, >>>> HWLOC_MEMBIND_INTERLEAVE, HWLOC_MEMBIND_MIGRATE); >>>> tmp = hwloc_alloc_membind_nodeset(topology, size, cset_available, >>>> HWLOC_MEMBIND_INTERLEAVE, HWLOC_MEMBIND_MIGRATE); >>>> but I did get it working here as well. >>>> >>>> >>>> *Can you please comment on this..? * >>>> >>>> Thank you very much in advance..!! >>>> - Raju >>>> >>>> On Mon, Mar 21, 2016 at 11:25 AM, Rezaul Karim Raju < >>>> <raju.cse.b...@gmail.com>raju.cse.b...@gmail.com> wrote: >>>> >>>>> Thanks, Brice.! >>>>> >>>>> On Mon, Mar 21, 2016 at 11:22 AM, Brice Goglin < >>>>> <brice.gog...@inria.fr>brice.gog...@inria.fr> wrote: >>>>> >>>>>> For testing, you can use this tarball: >>>>>> >>>>>> <https://ci.inria.fr/hwloc/job/zcustombranch-0-tarball/lastSuccessfulBuild/artifact/hwloc-getmemlocation-20160320.2208.gitd2f6537.tar.gz> >>>>>> https://ci.inria.fr/hwloc/job/zcustombranch-0-tarball/lastSuccessfulBuild/artifact/hwloc-getmemlocation-20160320.2208.gitd2f6537.tar.gz >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Le 21/03/2016 17:21, Rezaul Karim Raju a écrit : >>>>>> >>>>>> Hi Brice, >>>>>> >>>>>> Thanks for your email. >>>>>> I believe it is definitely helpful. Getting memory range within the >>>>>> current process will be very good information to drill down. >>>>>> Let me use this and I will get back if any clarification/comment I >>>>>> have. >>>>>> >>>>>> Regards- >>>>>> Raju >>>>>> >>>>>> On Sun, Mar 20, 2016 at 4:26 PM, Brice Goglin < >>>>>> <brice.gog...@inria.fr>brice.gog...@inria.fr> wrote: >>>>>> >>>>>>> I just pushed a proposal, see >>>>>>> <https://github.com/open-mpi/hwloc/issues/97> >>>>>>> https://github.com/open-mpi/hwloc/issues/97 >>>>>>> >>>>>>> Brice >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Le 18/12/2015 20:45, Brice Goglin a écrit : >>>>>>> >>>>>>> Yes, we're "thinking" about it. But there are open questions as >>>>>>> mentioned in the github issue. >>>>>>> By the way, we wouldn't return NULL in case of >>>>>>> non-physically-allocated buffer, but rather set the output nodeset to 0. >>>>>>> You should comment on the issue directly, instead of replying here, >>>>>>> otherwise your comments may get lost. >>>>>>> >>>>>>> Brice >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Le 18/12/2015 18:57, Rezaul Karim Raju a écrit : >>>>>>> >>>>>>> Hi Brice, >>>>>>> >>>>>>> Thanks for your time and nice explanation. >>>>>>> I have looked at the issue with location return (the page proportion >>>>>>> across multiple node & physical allocation). Are you thinking to add >>>>>>> this >>>>>>> function..? Like if we think list of node or nodes where the array is >>>>>>> allocated (only if physically allocated otherwise NULL) is it >>>>>>> possible..? >>>>>>> >>>>>>> I am looking for getting the physical location of data allocated by >>>>>>> OS default policy. Appreciate any better idea and please share with me. >>>>>>> >>>>>>> Best Regards, >>>>>>> - Raju >>>>>>> >>>>>>> On Tue, Dec 15, 2015 at 3:28 AM, Brice Goglin < >>>>>>> <brice.gog...@inria.fr>brice.gog...@inria.fr> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Le 15/12/2015 07:21, Brice Goglin a écrit : >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Le 15/12/2015 05:57, Rezaul Karim Raju a écrit : >>>>>>>> >>>>>>>> *OUTPUT: * >>>>>>>> *Policy-->* buffer(Array: A) *membind [default OS binding] Policy >>>>>>>> is:= 1 [1 refers to *HWLOC_MEMBIND_FIRSTTOUCH >>>>>>>> <https://www.open-mpi.org/projects/hwloc/doc/v1.11.1/a00083.php#ggac9764f79505775d06407b40f5e4661e8a979c7aa78dd32780858f30f47a72cca0> >>>>>>>> *]* >>>>>>>> *Nodeset --> *buffer(Array: A) bound to nodeset* 0x000000ff *with >>>>>>>> contains*:* >>>>>>>> node #0 (OS index 0) with 8387047424 bytes of memory >>>>>>>> node #1 (OS index 1) with 8471617536 bytes of memory >>>>>>>> node #2 (OS index 2) with 8471621632 bytes of memory >>>>>>>> node #3 (OS index 3) with 8471617536 bytes of memory >>>>>>>> node #4 (OS index 4) with 8471621632 bytes of memory >>>>>>>> node #5 (OS index 5) with 8471617536 bytes of memory >>>>>>>> node #6 (OS index 6) with 8471621632 bytes of memory >>>>>>>> node #7 (OS index 7) with 8471564288 bytes of memory >>>>>>>> >>>>>>>> *why it shows allocated memory is bound to all available nodeset..? >>>>>>>> should it not be allocated to a specific nodeset one ..?* >>>>>>>> >>>>>>>> >>>>>>>> Hello >>>>>>>> >>>>>>>> You are confusing the "binding" and the "actual location". Your >>>>>>>> memory buffer isn't bound to a specific location by default. But Linux >>>>>>>> has >>>>>>>> to allocate it somewhere. So your buffer is "located" in some node >>>>>>>> after >>>>>>>> the allocation, but it is not "bound" there (what get_area_membind >>>>>>>> returns) >>>>>>>> which means Linux could have allocated it somewhere else. >>>>>>>> >>>>>>>> hwloc cannot currently return the "location" of a memory buffer. I >>>>>>>> have been thinking about adding this feature in the past, but it looks >>>>>>>> like >>>>>>>> you are the first actual user requesting this. We could add something >>>>>>>> like >>>>>>>> hwloc_get_last_memory_location(topology, input buffer, >>>>>>>> outputnodeset) >>>>>>>> At least on Linux that's possible. >>>>>>>> >>>>>>>> For clarity, this is similar to the difference between >>>>>>>> hwloc_get_cpubind() and hwloc_get_last_cpu_location(): A task always >>>>>>>> runs >>>>>>>> on a specific PU, even if it is not bound to anything specific PU. >>>>>>>> >>>>>>>> >>>>>>>> By the way, there is already an issue for this: >>>>>>>> <https://github.com/open-mpi/hwloc/issues/97> >>>>>>>> https://github.com/open-mpi/hwloc/issues/97 >>>>>>>> >>>>>>>> Feel to comment there. >>>>>>>> >>>>>>>> Brice >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> hwloc-users mailing list >>>>>>>> <hwloc-us...@open-mpi.org>hwloc-us...@open-mpi.org >>>>>>>> Subscription: >>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users> >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >>>>>>>> Link to this post: >>>>>>>> <http://www.open-mpi.org/community/lists/hwloc-users/2015/12/1226.php> >>>>>>>> http://www.open-mpi.org/community/lists/hwloc-users/2015/12/1226.php >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------ >>>>>>> RaJu, Rezaul Karim >>>>>>> Graduate Student (PhD) | Computer Science | University of Houston >>>>>>> Research in High Performance Computing Tools >>>>>>> Houston, Texas-77004 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ------------------------ >>>>>> RaJu, Rezaul Karim >>>>>> Graduate Student (PhD) | Computer Science | University of Houston >>>>>> Research in High Performance Computing Tools >>>>>> Houston, Texas-77004 >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ------------------------ >>>>> RaJu, Rezaul Karim >>>>> Graduate Student (PhD) | Computer Science | University of Houston >>>>> Research in High Performance Computing Tools >>>>> Houston, Texas-77004 >>>>> >>>> >>>> >>>> >>>> -- >>>> ------------------------ >>>> RaJu, Rezaul Karim >>>> Graduate Student (PhD) | Computer Science | University of Houston >>>> Research in High Performance Computing Tools >>>> Houston, Texas-77004 >>>> >>>> >>>> >>> >>> >>> -- >>> ------------------------ >>> RaJu, Rezaul Karim >>> Graduate Student (PhD) | Computer Science | University of Houston >>> Research in High Performance Computing Tools >>> Houston, Texas-77004 >>> >>> >>> >> >> >> -- >> ------------------------ >> RaJu, Rezaul Karim >> Graduate Student (PhD) | Computer Science | University of Houston >> Research in High Performance Computing Tools >> Houston, Texas-77004 >> >> >> > > > -- > ------------------------ > RaJu, Rezaul Karim > Graduate Student (PhD) | Computer Science | University of Houston > Research in High Performance Computing Tools > Houston, Texas-77004 > > > -- ------------------------ RaJu, Rezaul Karim Graduate Student (PhD) | Computer Science | University of Houston Research in High Performance Computing Tools Houston, Texas-77004