Nishanth, This really helped a lot !! I am going through the libhugetlbfs code. Will start porting once i understand the complete flow.
Current stable version for mips linux is 2.6.26. Probably will migrate to it then. Thanks, Mehul. There are only 10 kind of people in the world ... those who knows binary and those who do not...... --- On Tue, 12/30/08, Nishanth Aravamudan <n...@us.ibm.com> wrote: From: Nishanth Aravamudan <n...@us.ibm.com> Subject: Re: [Libhugetlbfs-devel] Lib hugetlbfs To: "mehul vora" <mehu...@yahoo.com> Cc: libhugetlbfs-devel@lists.sourceforge.net, a...@us.ibm.com Date: Tuesday, December 30, 2008, 11:17 AM On 29.12.2008 [21:53:03 -0800], mehul vora wrote: > Nishanth, > > Thanks, i will probably migrate to 2.6.27 then. Well, 2.6.28 is out -- I'd probably go with that, in case there are kernel fixes even then. Not sure if there were or not, you could check the changelog. > Another doubt i had was, how libhugetlbfs allocates hugepages for > application's data/text segment without any kernel component in > "fork/execv/elfload" path ? Could you help me in understanding the > flow ? Hugepage allocation for program segments does not involve the kernel for anything other than the fault path. Basically, for each segment you want to back with hugepages, copy the data from the normal smallpage backed segment into a hugetlbfs file (unlinked normally, so not visible to snooping applications). Then, unmap the program segments (this is the black hole part of the code where we no longer have a text segment to rely on, for instance) and map in the hugetlbfs files in their place. On fork, the kernel will just perform COW on hugepages (I think) when faults are taken. On exec, you are just rerunning the original program again, so the steps I mentioned above will be done again. For a very heavy fork/exec workload with locality, it might make sense to specify HUGETLB_SHARE=1 (hugectl --share, I think) to share the hugetlbfs file which contains the text segments. That should reduce copy times (to zero for any but the first process), fault times (because every process has it mapped read-only as text) and hugepage consumption (as there will only be one copy of the text segment in hugepages, no matter how many of the same process are running, as long as they all specify SHARE=1). Experimenting with that will indicate the benefits for your application, of course. > I checked the file "hugeedit" file, it modifies the program header > of text/data segment and sets "PF_LINUX_HUGETLB" bit in pflags. But > who reads this finally ? Does the kernel architecture port of > hugetlbfs has to read this field and allocate a page accordingly ? hugeedit (I think this is covered in the HOWTO, but I'm not sure) is only used to change the default behavior for applications. With old-style relinking (using linker scripts), the default was to remap hugepages into any appropriately marked segments. But with new-style relinking, there is no marking of segments, so there would need to be an environment variable specified, or the program run under hugectl, I think, for hugepages to be used to back the segments. hugeedit modifies the binary's flags for the segments we want to remap to be "remap by default" by setting that PF_LINUX_HUGETLB flag. It also can unset it. The flag is only used by libhugetlbfs, not the kernel. Thanks, Nish -- Nishanth Aravamudan <n...@us.ibm.com> IBM Linux Technology Center
------------------------------------------------------------------------------
_______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel