On Thu, Sep 20, 2007 at 10:37:03AM -0500, Andrew Hastings wrote: > David Gibson wrote: > >On Tue, Sep 18, 2007 at 03:07:42PM -0500, Andrew Hastings wrote: > >>Adjust the kernel's idea of the end of the data segment (the 'brk' value) > >>to match user space when we remap the data segment onto hugepages. > >> > >>Signed-off-by: Andrew Hastings <[EMAIL PROTECTED]> on behalf of Cray Inc. > >>--- > >> > >>While I was running some of the libhugetlbfs linker tests under strace, > >>I noticed brk() was failing. Poking around in /proc/self/maps I > >>discovered why. > >> > >>Breaking brk() (no pun intended) is bad because (1) it causes malloc to > >>make two system calls every time it needs to grow the heap (a failing > >>sbrk and then an mmap) and (2) the process's address space ends up with > >>lots of little VMAs. > >> > >>Tested both 32- and 64-bit binaries under SUSE 10 on x86-64. > > > >Hrm. This doesn't seem right. I suspect it will break on powerpc. > >It's been quite a long time since I thought about this, so some > >somewhat jumbled observations: > > - I don't recall why the break doesn't end up at the end of > >the BSS in any case: the kernel loads the binary into ordinary pages, > >setting the break as it goes before libhugetlbfs remaps into > >hugepages > > - I *do* recall that we don't want the break to end up at the > >end of the (hugepage) BSS on powerpc: because of our address slice > >restrictions that would mean that expanding the break would attempt to > >create normal page mappings in a hugepage region which will fail of > >course (well, unless we still have bugs that cause the kernel to BUG() > >in this case - I think there's something in the testsuite for that > >now, though). > > - You only test the xBDT case in your patch. What about xB > >case? Again, the "normal" value of the break - at the end of the BSS > >is problematic on powerpc. > > - I don't think your testcases needs the CONFIG(). Expanding > >the break should work, regardless of how the executable is mapped. > > David, > > Thanks for your comments! > > I don't have access to a powerpc to verify this, but I'd bet that with > the current libhugetlbfs, the brk value is already in the middle of a > hugepage address slice.
> So apart from executing some additional code, > this patch shouldn't be making things any worse on powerpc -- the patch > just moves the brk value to a different location within the same > hugepage address slice. Please let me know if I'm wrong. I can believe that. But again, I don't understand in detail how the break positioning is different with your patch than without. I'd rather think about this and get it right rather than just "better than now under some circumstances" > Should I change the patch to use something like next_chunk (from > tests/brk_near_huge.c) to move the brk value to the end of the hugepage > address slice containing the BSS? Possibly. Although... currently we actually pad the BSS out to the end of the segment which might deal with that already. > I didn't want to build and run the non-remapped cases (since they > wouldn't be exercising the new code in libhugetlbfs), but I couldn't see > a way to do that without serious hacking on tests/Makefile -- thus I > added the CONFIG check. Would you like me to change the patch to run > all the cases, or just eliminate the CONFIG check? !? This seems even worse - since the other cases will be run, the testsuite will *always* get CONFIG errors and thus not run clean. In any case it should always be easy to set which tests are run, just by changing run_tests.sh In any case, I think it's worth running all the cases - it may not exercise libhugetlbfs code, but it means if it breaks it's immediately obvious whether it's the remapping that has broken things or whether we just broke the testcase itself somehow. Plus the xB case certainly does exercise things, > BTW, here's a code snippet that, when run under strace on x86_64 with > libhugetlbfs-1.2, demonstrates the two bad behaviors I mentioned in my > original post. Link with --hugetlbfs-link=B (or BDT). > > long hpagesize = gethugepagesize(); > > for (i = 0; i < 2*hpagesize/512; i++) { > p = malloc(512); > } -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel