Author: Nishanth Aravamudan <[EMAIL PROTECTED]>
Date: Mon Feb 5 14:22:02 2007 -0800
elflink: drop hugepage cached pages for writable segments
We currently use extra hugepages for writable segments because of the
first MAP_SHARED mmap() which stays resident in the page cache. Use a
forced COW for the filesz portion of writable segments and then
fadvise() to drop the page cache pages while keeping our PRIVATE
mapping. This is mutually exclusive to segment sharing, but is also
orthogonal in code because we only allow sharing of read-only segments.
Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
diff --git a/elflink.c b/elflink.c
index 780d87c..a0b5569 100644
--- a/elflink.c
+++ b/elflink.c
@@ -793,6 +793,7 @@ static void remap_segments(struct seg_info *seg, int num)
long hpage_size = gethugepagesize();
int i;
void *p;
+ char c;
/*
* XXX: The bogus call to mmap below forces ld.so to resolve the
@@ -831,6 +832,51 @@ static void remap_segments(struct seg_info *seg, int num)
/* The segments are all back at this point.
* and it should be safe to reference static data
*/
+
+ /*
+ * This pagecache dropping code should not be used for shared
+ * segments. But we currently only share read-only segments, so
+ * the below check for PROT_WRITE is implicitly sufficient.
+ */
+ for (i = 0; i < num; i++) {
+ if (seg[i].prot & PROT_WRITE) {
+ /*
+ * take a COW fault on each hugepage in the
+ * segment's file data ...
+ */
+ for (p = seg[i].vaddr;
+ p <= seg[i].vaddr + seg[i].filesz;
+ p += hpage_size) {
+ memcpy(&c, p, 1);
+ memcpy(p, &c, 1);
+ }
+ /*
+ * ... as well as each huge page in the
+ * extracopy area
+ *
+ * Note: if minimal_copy is enabled, we do *not*
+ * want to prefault in the remainder of the
+ * segment, as the upper boundary is quite high
+ * (to avoid collisions with small pages)
+ */
+ if (seg[i].extra_vaddr && minimal_copy) {
+ for (p = seg[i].extra_vaddr;
+ p <= seg[i].extra_vaddr +
+ seg[i].extrasz;
+ p += hpage_size) {
+ memcpy(&c, p, 1);
+ memcpy(p, &c, 1);
+ }
+ }
+ /*
+ * Note: fadvise() failing is not actually an
+ * error, as we'll just use an extra set of
+ * hugepages (in the pagecache).
+ */
+ fsync(seg[i].fd);
+ posix_fadvise(seg[i].fd, 0, 0, POSIX_FADV_DONTNEED);
+ }
+ }
}
static int check_env(void)
--
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel