On Wed, 2007-01-31 at 20:45 -0800, Nishanth Aravamudan wrote:
> We currently use extra hugepages for writable segments because of the
> first MAP_SHARED mmap() which stays resident in the page cache. Use a
> forced COW for the filesz portion of writable segments and then
> fadvise() to drop the page cache pages while keeping our PRIVATE
> mapping. This is mutually exclusive to segment sharing, but is also
> orthogonal in code because we only allow sharing of read-only segments.
>
> Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
Looking really nice. Just two things below.
> ---
>
> In testing with a simple relinked BDT binary that currently uses 3
> hugepages, this patch reduces the hugepage consumption to 2. `make
> check` still passes on x86, x86_64 and ppc.
>
> diff --git a/elflink.c b/elflink.c
> index 5a57358..d47488b 100644
> --- a/elflink.c
> +++ b/elflink.c
> @@ -789,8 +789,9 @@ static int obtain_prepared_file(struct seg_info
> *htlb_seg_info)
> static void remap_segments(struct seg_info *seg, int num)
> {
> long hpage_size = gethugepagesize();
> - int i;
> + int i, ret;
> void *p;
> + char c;
>
> /*
> * XXX: The bogus call to mmap below forces ld.so to resolve the
> @@ -829,6 +830,30 @@ static void remap_segments(struct seg_info *seg, int num)
> /* The segments are all back at this point.
> * and it should be safe to reference static data
> */
> +
> + /*
> + * This pagecache dropping code should not be used for shared
> + * segments. But we currently only share read-only segments, so
> + * the below check for PROT_WRITE is implicitly sufficient.
> + */
> + for (i = 0; i < num; i++) {
> + if (seg[i].prot & PROT_WRITE) {
> + for (p = seg[i].vaddr;
> + p <= seg[i].vaddr + seg[i].filesz;
I just realized the above should be (vaddr + filesz + extracopysize).
Otherwise we will throw away the initialized bss data that we've been so
careful to collect earlier.
> + p += hpage_size) {
> + memcpy(&c, p, 1);
> + memcpy(p, &c, 1);
> + }
Can we also add a comment about the above memcpy()s? We should mention
their purpose (to trigger an early COW fault for each hugepage in the
segment). It might also be worthwhile to note that prefaulting pages
here may degrade NUMA performance for very large data segments in
heavily multi-threaded apps.
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel