On 12.10.2006 [11:41:12 -0700], Nishanth Aravamudan wrote: > On 12.10.2006 [11:12:53 -0700], Nishanth Aravamudan wrote: > > On 11.10.2006 [11:29:05 +1000], David Gibson wrote: > > > On Tue, Oct 10, 2006 at 11:28:55AM -0700, Nishanth Aravamudan wrote: > > > > On 10.10.2006 [15:36:47 +1000], David Gibson wrote: > > <snip> > > > > > > Incidentally have people been running the testsuite routinely? For me > > > > > on current mainline it now produces many errors, and crashes the > > > > > machine (POWER5 LPAR). > > > > > > > > We have been running them as regularly as possible. Is this related to > > > > your recent post to LKML? Or an independent one? > > > > > > The crash is related to my lkml post, yes. I'm also getting testcase > > > failures on a bunch of the share cases, though, and nearly all the > > > 32-bit versions of the elflink tests. > > > > FWIW, everything passes on x86_64. So this would appear to be > > ppc-specific breakage? I'm bringing up a G5 to do some testing and see > > if I can't help track down the issues. > > I just ran the testsuite on a ppc64 kernel on a 2-way 2.0GHz G5 with 200 > hugepages allocated and it passed just fine. > > How many hugepages did you have allocated? If it's fewer than 10, then I > know the problem and it's fixable.
After some further debuggin, an "ah ha" moment -- the failed tests would appear to be directly related to your other complaint on the excessive requirements of hugepages. At one point, while `watch cat /proc/meminfo` while running `make func`, I noted that the number of hugepages Rsvd reached 150 (which would be 15 per each of the 10 processes in the linkshare testcase). In the short-term, while we fix the excessive reservation issue, I can change the run_tests.sh script to only request a number of processes that will work given the number of free hugepages on the system. How does this look? Description: Since we now pad the BSS of relinked binaries, we now require a larger number of hugepages than before, even if most of them are unused. This leads to issues with the linkshare testcase, as it spawns a fixed number of threads, all of which will consume hugepages and eventually lead to a ENOMEM (in hugepages) condition. Modify the testcase invocation to spawn a number of threads relative to the number of free hugepages (even if the BSS padding is fixed differently, this is a reasonable thing to do). Also modify the linkshare testcase to do nothing if no threads are requested (which will now occur if the number of hugepages free in the system is 0). Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]> diff --git a/tests/linkshare.c b/tests/linkshare.c index 227af08..f3fd50e 100644 --- a/tests/linkshare.c +++ b/tests/linkshare.c @@ -169,6 +169,9 @@ int main(int argc, char *argv[], char *e num_sharings = atoi(argv[1]); if (num_sharings > 99999) FAIL("Too many sharings requested (max = 99999)"); + if (num_sharings <= 0) + FAIL("Number of sharings requested must be greater " + "than or equal to 0"); children = (pid_t *)malloc(num_sharings * sizeof(pid_t)); if (!children) diff --git a/tests/run_tests.sh b/tests/run_tests.sh index b74b1e6..600d8aa 100755 --- a/tests/run_tests.sh +++ b/tests/run_tests.sh @@ -64,14 +64,15 @@ elfshare_test () { baseprog="${args[$N]}" unset args[$N] set -- "[EMAIL PROTECTED]" + NUM_THREADS=$((`free_hpages` / 15 - 1)) killall -HUP hugetlbd - run_test HUGETLB_SHARE=2 "$@" "xB.$baseprog" 10 + run_test HUGETLB_SHARE=2 "$@" "xB.$baseprog" $NUM_THREADS killall -HUP hugetlbd - run_test HUGETLB_SHARE=1 "$@" "xB.$baseprog" 10 + run_test HUGETLB_SHARE=1 "$@" "xB.$baseprog" $NUM_THREADS killall -HUP hugetlbd - run_test HUGETLB_SHARE=2 "$@" "xBDT.$baseprog" 10 + run_test HUGETLB_SHARE=2 "$@" "xBDT.$baseprog" $NUM_THREADS killall -HUP hugetlbd - run_test HUGETLB_SHARE=1 "$@" "xBDT.$baseprog" 10 + run_test HUGETLB_SHARE=1 "$@" "xBDT.$baseprog" $NUM_THREADS } setup_shm_sysctl() { -- Nishanth Aravamudan <[EMAIL PROTECTED]> IBM Linux Technology Center ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel