On 20.09.2007 [10:37:03 -0500], Andrew Hastings wrote:
> David Gibson wrote:
> > On Tue, Sep 18, 2007 at 03:07:42PM -0500, Andrew Hastings wrote:
> >> Adjust the kernel's idea of the end of the data segment (the 'brk' value)
> >> to match user space when we remap the data segment onto hugepages.
> >>
> >> Signed-off-by: Andrew Hastings <[EMAIL PROTECTED]> on behalf of Cray Inc.
> >> ---
> >>
> >> While I was running some of the libhugetlbfs linker tests under strace,
> >> I noticed brk() was failing.  Poking around in /proc/self/maps I
> >> discovered why.
> >>
> >> Breaking brk() (no pun intended) is bad because (1) it causes malloc to
> >> make two system calls every time it needs to grow the heap (a failing
> >> sbrk and then an mmap) and (2) the process's address space ends up with
> >> lots of little VMAs.
> >>
> >> Tested both 32- and 64-bit binaries under SUSE 10 on x86-64.
> > 
> > Hrm.  This doesn't seem right.  I suspect it will break on powerpc.
> > It's been quite a long time since I thought about this, so some
> > somewhat jumbled observations:
> >     - I don't recall why the break doesn't end up at the end of
> > the BSS in any case: the kernel loads the binary into ordinary pages,
> > setting the break as it goes before libhugetlbfs remaps into
> > hugepages
> >     - I *do* recall that we don't want the break to end up at the
> > end of the (hugepage) BSS on powerpc: because of our address slice
> > restrictions that would mean that expanding the break would attempt to
> > create normal page mappings in a hugepage region which will fail of
> > course (well, unless we still have bugs that cause the kernel to BUG()
> > in this case - I think there's something in the testsuite for that
> > now, though).
> >     - You only test the xBDT case in your patch.  What about xB
> > case?  Again, the "normal" value of the break - at the end of the BSS
> > is problematic on powerpc.
> >     - I don't think your testcases needs the CONFIG().  Expanding
> > the break should work, regardless of how the executable is mapped.
> 
> David,
> 
> Thanks for your comments!
> 
> I don't have access to a powerpc to verify this, but I'd bet that with 
> the current libhugetlbfs, the brk value is already in the middle of a 
> hugepage address slice.  So apart from executing some additional code, 
> this patch shouldn't be making things any worse on powerpc -- the patch 
> just moves the brk value to a different location within the same 
> hugepage address slice.  Please let me know if I'm wrong.
> 
> Should I change the patch to use something like next_chunk (from 
> tests/brk_near_huge.c) to move the brk value to the end of the hugepage 
> address slice containing the BSS?
> 
> I didn't want to build and run the non-remapped cases (since they 
> wouldn't be exercising the new code in libhugetlbfs), but I couldn't see 
> a way to do that without serious hacking on tests/Makefile -- thus I 
> added the CONFIG check.  Would you like me to change the patch to run 
> all the cases, or just eliminate the CONFIG check?
> 
> BTW, here's a code snippet that, when run under strace on x86_64 with 
> libhugetlbfs-1.2, demonstrates the two bad behaviors I mentioned in my 
> original post.  Link with --hugetlbfs-link=B (or BDT).
> 
>          long hpagesize = gethugepagesize();
> 
>          for (i = 0; i < 2*hpagesize/512; i++) {
>                  p = malloc(512);
>          }

David's suspicions are well-founded, I believe. While testing T-only
relinking, I found that 64-bit xB.linkhuge fails at remap_segments with
errno=16.

[EMAIL PROTECTED]:~/libhugetlbfs-1.2-4-g2dab72f# HUGETLB_ELFMAP=yes 
LD_LIBRARY_PATH=./obj64/  ./tests/obj64/xB.linkhuge
Failed to map hugepage segment 0: 18000000000-18001000000 (errno=16)
Aborted
[EMAIL PROTECTED]:~/libhugetlbfs-1.2-4-g2dab72f# HUGETLB_VERBOSE=99 
HUGETLB_ELFMAP=yes LD_LIBRARY_PATH=./obj64/  ./tests/obj64/xB.linkhuge
libhugetlbfs: HUGETLB_SHARE=0, sharing disabled
libhugetlbfs: Hugepage segment 0 (phdr 4): 0x18000000000-0x18000010040  
(filesz=0x8) (prot = 0x7)
libhugetlbfs: libhugetlbfs version: 1.2-4-g2dab72f
libhugetlbfs: Mapped hugeseg at 0x3ffff000000. Copying 0x8 bytes from 
0x18000000000...done
libhugetlbfs: Prepare succeeded
libhugetlbfs: Current brk=0x18000011000
libhugetlbfs: 0x18000000000 - 0x18001000000
libhugetlbfs: New brk=0x18001000000
Failed to map hugepage segment 0: 18000000000-18001000000 (errno=16)
Aborted

Taking out your code, I get:

[EMAIL PROTECTED]:~/libhugetlbfs-1.2-4-g2dab72f# HUGETLB_VERBOSE=99 
HUGETLB_ELFMAP=yes LD_LIBRARY_PATH=./obj64/  ./tests/obj64/xB.linkhuge
libhugetlbfs: HUGETLB_SHARE=0, sharing disabled
libhugetlbfs: Hugepage segment 0 (phdr 4): 0x18000000000-0x18000010040  
(filesz=0x8) (prot = 0x7)
libhugetlbfs: libhugetlbfs version: 1.2-4-g2dab72f
libhugetlbfs: Mapped hugeseg at 0x3ffff000000. Copying 0x8 bytes from 
0x18000000000...done
libhugetlbfs: Prepare succeeded
Starting testcase "./tests/obj64/xB.linkhuge", pid 14611
Link string is [xB], HUGETLB_ELFMAP=yes
Hugepages used for: small_bss big_bss
PASS

The problem is you're bumping the brk() up within a hugepage region. And since
we haven't mapped in small pages yet, the brk() succeeds with small pages and
then the hugepage map-in fails (not entirely sure why it gives EBUSY...)

That makes this patch a complete no-go for ppc, as David suspected, I
think.

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to