On 14.11.2006 [11:11:47 +1100], David Gibson wrote: > On Mon, Nov 13, 2006 at 04:01:01PM -0800, Nishanth Aravamudan wrote: > > On 14.11.2006 [10:17:25 +1100], David Gibson wrote: > > > On Mon, Nov 13, 2006 at 10:06:19AM -0800, Nishanth Aravamudan wrote: > > > > On 12.11.2006 [11:19:23 +1100], David Gibson wrote: > > > > > On Fri, Nov 10, 2006 at 11:43:56AM -0800, Nishanth Aravamudan wrote: > > > > > > Fix linkshare testcase to catch the case where the sharing children > > > > > > are > > > > > > killed by a signal. Currently, if a child segfaults, we still PASS > > > > > > the > > > > > > test, when it clearly should be a FAIL case. > > > > > > > > > > Um.. yes, we certainly want this. > > > > > > > > Yep, and my solution is incomplete -- we actually want a SIGSEGV > > > > handler, as that has been (in my testing), the only way to reliably > > > > catch this fail-case on x86_64. Would it make sense to globally install > > > > one in test_init(), like we do for SIGINT? > > > > > > Huh? Why does AMD64 need a SEGV handler? > > > > Sorry, I may have been unclear hear, but linkshare is segfaulting on > > my x86_64 box (and others). Try running make check, then check > > dmesg, you'll see. > > Oh, not just a SEGV but an oops then, if it's showing up in dmesg. I > don't think a SEGV handler will even catch that.
You clearly don't run x86_64 that often, huh? On x86_64, if a process segfaults without a handler, a message will be dumped to the kernel log, like so: [19629.085922] xBDT.linkshare[17585]: segfault at 00002b9d89c1f040 rip 0000000001001116 rsp 00007fff20ea1410 error 6 > But in any case, I'm not particularly worried about bad failures > showing up as SEGVs. It's not a matter of what they show up as, but that they should be caught as FAILs regardless. Currently, if a child (sharer) in the linkshare testcase segfaults, we pass the test. That should not be the case, agreed? > I don't see how the top level testcase can PASS in this case. I > consider the overall pass condition for the testsuite to be "all > tests PASS, *and* there are no deaths by signal or kernel explosions > during the run". Yes, it was an uncaught coding error on my part, when I added the linkshare testcase. I'm fixing it now, though. > We do, of course, need to work out why the test is SEGVing in the > first place and fix it. And if the kernel is oopsing, fix that > regardless of what the testcase is doing. The kernel is not Oopsing, see above. The linkshare testcases *are* consistently segfaulting, though. Here is the new patch I've got so far, still making sure we're catching all the segfaults, as sometimes it seems like we aren't in my testing. diff --git a/tests/linkshare.c b/tests/linkshare.c index 3461a7d..5db2465 100644 --- a/tests/linkshare.c +++ b/tests/linkshare.c @@ -24,6 +24,7 @@ #include <time.h> #include <errno.h> #include <limits.h> +#include <signal.h> #include <sys/types.h> #include <sys/mman.h> #include <sys/shm.h> @@ -134,15 +135,31 @@ static ino_t do_test(struct test_entry * return get_addr_inode(te->data); } +static void signal_handler(int signum, siginfo_t *si, void *context) +{ + verbose_printf("Process %d got a Segmentation Fault at address %p\n", + getpid(), si->si_addr); + getchar(); + exit(RC_FAIL); +} + +static struct sigaction sa = { + .sa_sigaction = signal_handler, + .sa_flags = SA_SIGINFO, +}; + int main(int argc, char *argv[], char *envp[]) { int i; int shmid; ino_t *shm; int num_sharings; + int status; test_init(argc, argv); + sigaction(SIGSEGV, &sa, NULL); + if (argc == 2) { /* * first process @@ -216,12 +233,24 @@ int main(int argc, char *argv[], char *e } } for (i = 0; i < num_sharings; i++) { - ret = waitpid(children[i], NULL, 0); + ret = waitpid(children[i], &status, 0); if (ret < 0) { shmctl(shmid, IPC_RMID, NULL); shmdt(shm); FAIL("waitpid failed: %s", strerror(errno)); } + if (WIFEXITED(status) && WEXITSTATUS(status) != 0) { + shmctl(shmid, IPC_RMID, NULL); + shmdt(shm); + FAIL("Child %d exited with abornmal status: %d", + i + 1, WEXITSTATUS(status)); + } + if (WIFSIGNALED(status)) { + shmctl(shmid, IPC_RMID, NULL); + shmdt(shm); + FAIL("Child %d was signalled: %d", i + 1, + WTERMSIG(status)); + } } for (i = 0; i < NUM_TESTS; i++) { ino_t base = shm[i]; -- Nishanth Aravamudan <[EMAIL PROTECTED]> IBM Linux Technology Center ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel