On 01.02.2007 [09:43:01 -0600], Adam Litke wrote:
> On Wed, 2007-01-31 at 20:45 -0800, Nishanth Aravamudan wrote:
> > We currently use extra hugepages for writable segments because of the
> > first MAP_SHARED mmap() which stays resident in the page cache. Use a
> > forced COW for the filesz portion of writable segments and then
> > fadvise() to drop the page cache pages while keeping our PRIVATE
> > mapping. This is mutually exclusive to segment sharing, but is also
> > orthogonal in code because we only allow sharing of read-only segments.
> >
> > Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
>
> Looking really nice. Just two things below.
>
> > ---
> >
> > In testing with a simple relinked BDT binary that currently uses 3
> > hugepages, this patch reduces the hugepage consumption to 2. `make
> > check` still passes on x86, x86_64 and ppc.
> >
> > diff --git a/elflink.c b/elflink.c
> > index 5a57358..d47488b 100644
> > --- a/elflink.c
> > +++ b/elflink.c
> > @@ -789,8 +789,9 @@ static int obtain_prepared_file(struct seg_info
> > *htlb_seg_info)
> > static void remap_segments(struct seg_info *seg, int num)
> > {
> > long hpage_size = gethugepagesize();
> > - int i;
> > + int i, ret;
> > void *p;
> > + char c;
> >
> > /*
> > * XXX: The bogus call to mmap below forces ld.so to resolve the
> > @@ -829,6 +830,30 @@ static void remap_segments(struct seg_info *seg, int
> > num)
> > /* The segments are all back at this point.
> > * and it should be safe to reference static data
> > */
> > +
> > + /*
> > + * This pagecache dropping code should not be used for shared
> > + * segments. But we currently only share read-only segments, so
> > + * the below check for PROT_WRITE is implicitly sufficient.
> > + */
> > + for (i = 0; i < num; i++) {
> > + if (seg[i].prot & PROT_WRITE) {
> > + for (p = seg[i].vaddr;
> > + p <= seg[i].vaddr + seg[i].filesz;
> I just realized the above should be (vaddr + filesz + extracopysize).
> Otherwise we will throw away the initialized bss data that we've been
> so careful to collect earlier.
>
> > + p += hpage_size) {
> > + memcpy(&c, p, 1);
> > + memcpy(p, &c, 1);
> > + }
> Can we also add a comment about the above memcpy()s? We should
> mention their purpose (to trigger an early COW fault for each hugepage
> in the segment). It might also be worthwhile to note that prefaulting
> pages here may degrade NUMA performance for very large data segments
> in heavily multi-threaded apps.
Hrm, do we want to make this configurable too, then? So that those large
NUMA users can avoid having this prefaulting occur?
Thanks,
Nish
--
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel