On 01.02.2007 [18:24:39 -0800], Nishanth Aravamudan wrote:
> On 01.02.2007 [18:20:22 -0800], Nishanth Aravamudan wrote:
> > On 01.02.2007 [09:43:01 -0600], Adam Litke wrote:
> > > On Wed, 2007-01-31 at 20:45 -0800, Nishanth Aravamudan wrote:
> <snip>
> > > > + for (i = 0; i < num; i++) {
> > > > + if (seg[i].prot & PROT_WRITE) {
> > > > + for (p = seg[i].vaddr;
> > > > + p <= seg[i].vaddr + seg[i].filesz;
> > > I just realized the above should be (vaddr + filesz + extracopysize).
> > > Otherwise we will throw away the initialized bss data that we've been so
> > > careful to collect earlier.
> >
> > This does complicate things quite a bit, unfortunately, we currently
> > throw away the extracopy information once we've done it. I've modified
> > the code a bit to save this info, and I think I've got it right, but I'd
> > appreciate you and Steve taking a look at the following two patches,
> > which should allow for what you want.
commit be71d7660e0be93d7b5c4791508fa7a9a3cb23b4
Author: Nishanth Aravamudan <[EMAIL PROTECTED]>
Date: Thu Feb 1 18:07:21 2007 -0800
We currently use extra hugepages for writable segments because of the
first MAP_SHARED mmap() which stays resident in the page cache. Use a
forced COW for the filesz portion of writable segments and then
fadvise() to drop the page cache pages while keeping our PRIVATE
mapping. This is mutually exclusive to segment sharing, but is also
orthogonal in code because we only allow sharing of read-only segments.
Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
diff --git a/elflink.c b/elflink.c
index a8d5d4a..4496216 100644
--- a/elflink.c
+++ b/elflink.c
@@ -792,8 +792,9 @@ static int obtain_prepared_file(struct seg_info
*htlb_seg_info)
static void remap_segments(struct seg_info *seg, int num)
{
long hpage_size = gethugepagesize();
- int i;
+ int i, ret;
void *p;
+ char c;
/*
* XXX: The bogus call to mmap below forces ld.so to resolve the
@@ -832,6 +833,47 @@ static void remap_segments(struct seg_info *seg, int num)
/* The segments are all back at this point.
* and it should be safe to reference static data
*/
+
+ /*
+ * This pagecache dropping code should not be used for shared
+ * segments. But we currently only share read-only segments, so
+ * the below check for PROT_WRITE is implicitly sufficient.
+ */
+ for (i = 0; i < num; i++) {
+ if (seg[i].prot & PROT_WRITE) {
+ /*
+ * take a COW fault on each hugepage in the
+ * segment's file data ...
+ */
+ for (p = seg[i].vaddr;
+ p <= seg[i].vaddr + seg[i].filesz;
+ p += hpage_size) {
+ memcpy(&c, p, 1);
+ memcpy(p, &c, 1);
+ }
+ /*
+ * ... as well as each huge page in the
+ * extracopy area
+ */
+ if (seg[i].extra_vaddr) {
+ for (p = seg[i].extra_vaddr;
+ p <= seg[i].extra_vaddr +
+ seg[i].extrasz;
+ p += hpage_size) {
+ memcpy(&c, p, 1);
+ memcpy(p, &c, 1);
+ }
+ }
+ /*
+ * fadvise() failing is not actually an error,
+ * as we'll just use an extra set of hugepages
+ * (in the pagecache).
+ */
+ ret = posix_fadvise(seg[i].fd, 0, 0,
POSIX_FADV_DONTNEED);
+ if (__debug && ret < 0)
+ DEBUG("fadvise() failed, extra hugepages may be
used\n");
+ }
+ }
}
static int check_env(void)
--
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel