On 20.09.2007 [10:37:03 -0500], Andrew Hastings wrote: > David Gibson wrote: > > On Tue, Sep 18, 2007 at 03:07:42PM -0500, Andrew Hastings wrote: > >> Adjust the kernel's idea of the end of the data segment (the 'brk' value) > >> to match user space when we remap the data segment onto hugepages. > >> > >> Signed-off-by: Andrew Hastings <[EMAIL PROTECTED]> on behalf of Cray Inc. > >> --- > >> > >> While I was running some of the libhugetlbfs linker tests under strace, > >> I noticed brk() was failing. Poking around in /proc/self/maps I > >> discovered why. > >> > >> Breaking brk() (no pun intended) is bad because (1) it causes malloc to > >> make two system calls every time it needs to grow the heap (a failing > >> sbrk and then an mmap) and (2) the process's address space ends up with > >> lots of little VMAs. > >> > >> Tested both 32- and 64-bit binaries under SUSE 10 on x86-64. > > > > Hrm. This doesn't seem right. I suspect it will break on powerpc. > > It's been quite a long time since I thought about this, so some > > somewhat jumbled observations: > > - I don't recall why the break doesn't end up at the end of > > the BSS in any case: the kernel loads the binary into ordinary pages, > > setting the break as it goes before libhugetlbfs remaps into > > hugepages > > - I *do* recall that we don't want the break to end up at the > > end of the (hugepage) BSS on powerpc: because of our address slice > > restrictions that would mean that expanding the break would attempt to > > create normal page mappings in a hugepage region which will fail of > > course (well, unless we still have bugs that cause the kernel to BUG() > > in this case - I think there's something in the testsuite for that > > now, though). > > - You only test the xBDT case in your patch. What about xB > > case? Again, the "normal" value of the break - at the end of the BSS > > is problematic on powerpc. > > - I don't think your testcases needs the CONFIG(). Expanding > > the break should work, regardless of how the executable is mapped. > > David, > > Thanks for your comments! > > I don't have access to a powerpc to verify this, but I'd bet that with > the current libhugetlbfs, the brk value is already in the middle of a > hugepage address slice. So apart from executing some additional code, > this patch shouldn't be making things any worse on powerpc -- the patch > just moves the brk value to a different location within the same > hugepage address slice. Please let me know if I'm wrong. > > Should I change the patch to use something like next_chunk (from > tests/brk_near_huge.c) to move the brk value to the end of the hugepage > address slice containing the BSS? > > I didn't want to build and run the non-remapped cases (since they > wouldn't be exercising the new code in libhugetlbfs), but I couldn't see > a way to do that without serious hacking on tests/Makefile -- thus I > added the CONFIG check. Would you like me to change the patch to run > all the cases, or just eliminate the CONFIG check? > > BTW, here's a code snippet that, when run under strace on x86_64 with > libhugetlbfs-1.2, demonstrates the two bad behaviors I mentioned in my > original post. Link with --hugetlbfs-link=B (or BDT). > > long hpagesize = gethugepagesize(); > > for (i = 0; i < 2*hpagesize/512; i++) { > p = malloc(512); > }
David's suspicions are well-founded, I believe. While testing T-only relinking, I found that 64-bit xB.linkhuge fails at remap_segments with errno=16. [EMAIL PROTECTED]:~/libhugetlbfs-1.2-4-g2dab72f# HUGETLB_ELFMAP=yes LD_LIBRARY_PATH=./obj64/ ./tests/obj64/xB.linkhuge Failed to map hugepage segment 0: 18000000000-18001000000 (errno=16) Aborted [EMAIL PROTECTED]:~/libhugetlbfs-1.2-4-g2dab72f# HUGETLB_VERBOSE=99 HUGETLB_ELFMAP=yes LD_LIBRARY_PATH=./obj64/ ./tests/obj64/xB.linkhuge libhugetlbfs: HUGETLB_SHARE=0, sharing disabled libhugetlbfs: Hugepage segment 0 (phdr 4): 0x18000000000-0x18000010040 (filesz=0x8) (prot = 0x7) libhugetlbfs: libhugetlbfs version: 1.2-4-g2dab72f libhugetlbfs: Mapped hugeseg at 0x3ffff000000. Copying 0x8 bytes from 0x18000000000...done libhugetlbfs: Prepare succeeded libhugetlbfs: Current brk=0x18000011000 libhugetlbfs: 0x18000000000 - 0x18001000000 libhugetlbfs: New brk=0x18001000000 Failed to map hugepage segment 0: 18000000000-18001000000 (errno=16) Aborted Taking out your code, I get: [EMAIL PROTECTED]:~/libhugetlbfs-1.2-4-g2dab72f# HUGETLB_VERBOSE=99 HUGETLB_ELFMAP=yes LD_LIBRARY_PATH=./obj64/ ./tests/obj64/xB.linkhuge libhugetlbfs: HUGETLB_SHARE=0, sharing disabled libhugetlbfs: Hugepage segment 0 (phdr 4): 0x18000000000-0x18000010040 (filesz=0x8) (prot = 0x7) libhugetlbfs: libhugetlbfs version: 1.2-4-g2dab72f libhugetlbfs: Mapped hugeseg at 0x3ffff000000. Copying 0x8 bytes from 0x18000000000...done libhugetlbfs: Prepare succeeded Starting testcase "./tests/obj64/xB.linkhuge", pid 14611 Link string is [xB], HUGETLB_ELFMAP=yes Hugepages used for: small_bss big_bss PASS The problem is you're bumping the brk() up within a hugepage region. And since we haven't mapped in small pages yet, the brk() succeeds with small pages and then the hugepage map-in fails (not entirely sure why it gives EBUSY...) That makes this patch a complete no-go for ppc, as David suspected, I think. Thanks, Nish -- Nishanth Aravamudan <[EMAIL PROTECTED]> IBM Linux Technology Center ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel