We currently use extra hugepages for writable segments because of the
first MAP_SHARED mmap() which stays resident in the page cache. Use a
forced COW for the filesz portion of writable segments and then
fadvise() to drop the page cache pages while keeping our PRIVATE
mapping. This is mutually exclusive to segment sharing, but is also
orthogonal in code because we only allow sharing of read-only segments.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

---

In testing with a simple relinked BDT binary that currently uses 3
hugepages, this patch reduces the hugepage consumption to 2. `make
check` still passes on x86, x86_64 and ppc.

diff --git a/elflink.c b/elflink.c
index 5a57358..d47488b 100644
--- a/elflink.c
+++ b/elflink.c
@@ -789,8 +789,9 @@ static int obtain_prepared_file(struct seg_info 
*htlb_seg_info)
 static void remap_segments(struct seg_info *seg, int num)
 {
        long hpage_size = gethugepagesize();
-       int i;
+       int i, ret;
        void *p;
+       char c;
 
        /*
         * XXX: The bogus call to mmap below forces ld.so to resolve the
@@ -829,6 +830,30 @@ static void remap_segments(struct seg_info *seg, int num)
        /* The segments are all back at this point.
         * and it should be safe to reference static data
         */
+
+       /*
+        * This pagecache dropping code should not be used for shared
+        * segments.  But we currently only share read-only segments, so
+        * the below check for PROT_WRITE is implicitly sufficient.
+        */
+       for (i = 0; i < num; i++) {
+               if (seg[i].prot & PROT_WRITE) {
+                       for (p = seg[i].vaddr;
+                            p <= seg[i].vaddr + seg[i].filesz;
+                            p += hpage_size) {
+                               memcpy(&c, p, 1);
+                               memcpy(p, &c, 1);
+                       }
+                       /*
+                        * fadvise() failing is not actually an error,
+                        * as we'll just use an extra set of hugepages
+                        * (in the pagecache).
+                        */
+                       ret = posix_fadvise(seg[i].fd, 0, 0, 
POSIX_FADV_DONTNEED);
+                       if (__debug && ret < 0)
+                               DEBUG("fadvise() failed, extra hugepages may be 
used\n");
+               }
+       }
 }
 
 static int check_env(void)


-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to