Re: [Libhugetlbfs-devel] [RFC][PATCH] elflink: drop hugepage cached pages for writable segments

Nishanth Aravamudan Thu, 01 Feb 2007 08:37:27 -0800

On 01.02.2007 [09:43:01 -0600], Adam Litke wrote:
> On Wed, 2007-01-31 at 20:45 -0800, Nishanth Aravamudan wrote:
> > We currently use extra hugepages for writable segments because of the
> > first MAP_SHARED mmap() which stays resident in the page cache. Use a
> > forced COW for the filesz portion of writable segments and then
> > fadvise() to drop the page cache pages while keeping our PRIVATE
> > mapping. This is mutually exclusive to segment sharing, but is also
> > orthogonal in code because we only allow sharing of read-only segments.
> > 
> > Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
> 
> Looking really nice.  Just two things below.
> 
> > ---
> > 
> > In testing with a simple relinked BDT binary that currently uses 3
> > hugepages, this patch reduces the hugepage consumption to 2. `make
> > check` still passes on x86, x86_64 and ppc.
> > 
> > diff --git a/elflink.c b/elflink.c
> > index 5a57358..d47488b 100644
> > --- a/elflink.c
> > +++ b/elflink.c
> > @@ -789,8 +789,9 @@ static int obtain_prepared_file(struct seg_info 
> > *htlb_seg_info)
> >  static void remap_segments(struct seg_info *seg, int num)
> >  {
> >     long hpage_size = gethugepagesize();
> > -   int i;
> > +   int i, ret;
> >     void *p;
> > +   char c;
> > 
> >     /*
> >      * XXX: The bogus call to mmap below forces ld.so to resolve the
> > @@ -829,6 +830,30 @@ static void remap_segments(struct seg_info *seg, int 
> > num)
> >     /* The segments are all back at this point.
> >      * and it should be safe to reference static data
> >      */
> > +
> > +   /*
> > +    * This pagecache dropping code should not be used for shared
> > +    * segments.  But we currently only share read-only segments, so
> > +    * the below check for PROT_WRITE is implicitly sufficient.
> > +    */
> > +   for (i = 0; i < num; i++) {
> > +           if (seg[i].prot & PROT_WRITE) {
> > +                   for (p = seg[i].vaddr;
> > +                        p <= seg[i].vaddr + seg[i].filesz;
> I just realized the above should be (vaddr + filesz + extracopysize).
> Otherwise we will throw away the initialized bss data that we've been
> so careful to collect earlier.
> 
> > +                        p += hpage_size) {
> > +                           memcpy(&c, p, 1);
> > +                           memcpy(p, &c, 1);
> > +                   }
> Can we also add a comment about the above memcpy()s?  We should
> mention their purpose (to trigger an early COW fault for each hugepage
> in the segment).  It might also be worthwhile to note that prefaulting
> pages here may degrade NUMA performance for very large data segments
> in heavily multi-threaded apps.


Hrm, do we want to make this configurable too, then? So that those large
NUMA users can avoid having this prefaulting occur?

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Re: [Libhugetlbfs-devel] [RFC][PATCH] elflink: drop hugepage cached pages for writable segments

Reply via email to