On Tue, Jan 23, 2007 at 10:56:23PM +0000, Adam Litke wrote:
> Sonny Rao discovered a bug which affects powerpc machines.  If the stack
> is allowed to grow into a segment that has been converted over to huge
> pages, current kernels will try to instantiate normal pages into a huge
> page-only segment.  This test case reproduces this behavior for 32bit
> and 64bit binaries.
> 
> Since there is no upstream fix for this bug yet, I don't think it's wise
> to run it all the time and needlessly render machines unstable.
> However, we don't want to forget that it's not being run by default
> either.  This patch also introduces a SKIP mode to run_tests.sh.  It
> will cause the named test to be skipped and a message printed that
> indicates that a test was skipped.

Hrm.  I'd prefer a separate 'skip_test' function for this.

Incidentally, something that would be a really good idea, but I've
never got around to would be some 'summary' support in the test code:
at the end of the run it would print the total number of tests passed,
failed, skipped (due to bad config or this new skip notation), etc.

> diff --git a/tests/Makefile b/tests/Makefile
> index 7cd823a..7ba7ba7 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -6,7 +6,7 @@ LIB_TESTS = gethugepagesize test_root fi
>       chunk-overcommit mprotect alloc-instantiate-race mlock \
>       truncate_reserve_wraparound truncate_sigbus_versus_oom \
>       map_high_truncate_2 truncate_above_4GB direct \
> -     misaligned_offset brk_near_huge task-size-overrun
> +     misaligned_offset brk_near_huge task-size-overrun stack_grow_into_huge
>  LIB_TESTS_64 = straddle_4GB huge_at_4GB_normal_below \
>       huge_below_4GB_normal_above
>  NOLIB_TESTS = malloc malloc_manysmall dummy
> diff --git a/tests/run_tests.sh b/tests/run_tests.sh
> index 30ca82a..a914813 100755
> --- a/tests/run_tests.sh
> +++ b/tests/run_tests.sh
> @@ -35,7 +35,10 @@ run_test_bits () {
>      BITS=$1
>      shift
>  
> -    if [ -d obj$BITS ]; then
> +    if [ "$1" == "SKIP" ]; then
> +     shift
> +     echo "$@ ($BITS):       SKIPPED"
> +    elif [ -d obj$BITS ]; then
>       echo -n "$@ ($BITS):    "
>       PATH="obj$BITS:$PATH" LD_LIBRARY_PATH="$LD_LIBRARY_PATH:../obj$BITS" 
> $ENV "$@"
>      fi
> @@ -154,6 +157,7 @@ # Specific kernel bug tests
>      run_test truncate_above_4GB
>      run_test brk_near_huge
>      run_test task-size-overrun
> +    run_test SKIP stack_grow_into_huge
>  
>  # Tests requiring an active mount and hugepage COW
>      run_test private
> diff --git a/tests/stack_grow_into_huge.c b/tests/stack_grow_into_huge.c
> new file mode 100644
> index 0000000..ff1058e
> --- /dev/null
> +++ b/tests/stack_grow_into_huge.c
> @@ -0,0 +1,89 @@
> +/*
> + * libhugetlbfs - Easy use of Linux hugepages
> + * Copyright (C) 2005-2006 David Gibson & Adam Litke, IBM Corporation.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public License
> + * as published by the Free Software Foundation; either version 2.1 of
> + * the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <sys/resource.h>
> +#include <sys/wait.h>
> +
> +#include <hugetlbfs.h>
> +#include "hugetests.h"
> +
> +/*
> + * Test rationale:
> + *
> + * On PowerPC, the address space is divided into segments.  These segments 
> can
> + * contain either huge pages or normal pages, but not both.  All segments are
> + * initially set up to map normal pages.  When a huge page mapping is created
> + * within a set of empty segments, they are "enabled" for huge pages at that
> + * time.  Once enabled for huge pages, they can not be used again for normal
> + * pages for the remaining lifetime of the process.
> + *
> + * If the segment immediately preceeding the segment containing the stack is
> + * converted to huge pages and the stack is made to grow into the this
> + * preceeding segment, some kernels may attempt to map normal pages into the
> + * huge page-only segment -- resulting in bugs.
> + */
> +
> +int main(int argc, char *argv[])
> +{
> +     int fd, i, pid, s;
> +     struct rlimit r;
> +     char *b;
> +     unsigned long hpage_size = gethugepagesize();
> +
> +     test_init(argc, argv);
> +
> +     i = getrlimit(RLIMIT_STACK, &r);
> +     if (i)
> +             CONFIG("getrlimit failed");
> +
> +     if (r.rlim_cur != RLIM_INFINITY)
> +             CONFIG("Stack rlimit must be 'unlimited'");

I'm a bit torn here.  We're now running the testsuite as root, so we
could do a setrlimit here, simplifying things.  On the other hand I
rather miss the fact that the testsuite can't be run as non-root any
more, so I'm disinclined to make it harder to re-acquire that.

> +     fd = hugetlbfs_unlinked_fd();
> +     if (fd < 0)
> +             CONFIG("Couldn't get hugepage fd");
> +
> +     b = mmap(0, hpage_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> +     if (b != MAP_FAILED) {
> +             munmap(b, hpage_size);
> +     } else
> +             FAIL("mmap");

Braces on neither or both clauses, please.

> +     if ((pid = fork()) < 0)
> +             FAIL("fork");
> +
> +     if (pid == 0) {
> +             while (1) { alloca(16*1024*1024); }

Ow, please don't skimp on proper formatting, extra lines don't cost.
In any case I'd be inclined to but the child logic into a separate
function for clarity.

> +             exit (0);

And make use of the exit(), to de-indent the rest of the checks.

> +     } else {
> +             waitpid(pid, &s, 0);

Best to check waitpid()s return code, just in case.

> +             if (WIFSIGNALED(s)) {
> +                     if (WTERMSIG(s) == SIGTRAP) {
> +                             FAIL("SIGTRAP received");
> +                     } else {
> +                             PASS();
> +                     }

This could do with some explaining comments.  And probably an inverted
sense of the test: what signal *should* we get if we get a stack
collision (I'm guessing SEGV, but we'd have to check the kernel code).

> +             }
> +     }
> +     FAIL("Child not signalled");
> +}
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to