Hi all, I'm hitting a brick wall debugging the linkshare segfaults I'm seeing.
(These logs are from my 2-way x86_64, but I'm seeing similar issues on a G5 (ppc64): HUGETLB_SHARE=2 xB.linkshare 2 (32): PASS HUGETLB_SHARE=2 xB.linkshare 2 (64): PASS HUGETLB_SHARE=1 xB.linkshare 2 (32): FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xB.linkshare 2 (64): FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=2 xBDT.linkshare 2 (32): PASS HUGETLB_SHARE=2 xBDT.linkshare 2 (64): PASS HUGETLB_SHARE=1 xBDT.linkshare 2 (32): FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xBDT.linkshare 2 (64): FAIL 2 of 2 children exited abnormally With all 4 failures being segmentation faults we caught. ) I'm including below four outputs from my testing, as well as the current patch I'm using. [1] the patch I'm using, which is applied on top of the patch I sent to the list earlier to rm -rf files in /mnt/hugetlbfs between runs of elfshare_test. [2] the output of `make func` from the libhuge testsuite. We get 3 PASSes and 5 FAILs. [3] the dmesg output from the `make func` run, with a dmesg -c before. For some reason we're not catching some segfaults? [4] the output of `make funcv` (which only should be changing the verbosity of the tests) from the libhuge testsuite. Note, we now get 2 PASSes and 6 FAILs. [5] the dmesg output from the `make funcv` run, with a dmesg -c before. We're again not catching all the segfaults. Three problems, then. Obviously, one is why linkshare is segfaulting. Two is why func and funcv differ? And three is why are we not catching all the segfaults with a SIGSEGV handler? Thanks, Nish [1] diff --git a/tests/linkshare.c b/tests/linkshare.c index 3461a7d..308fd75 100644 --- a/tests/linkshare.c +++ b/tests/linkshare.c @@ -16,6 +16,7 @@ * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ +#define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> @@ -24,6 +25,8 @@ #include <time.h> #include <errno.h> #include <limits.h> +#include <signal.h> +#include <string.h> #include <sys/types.h> #include <sys/mman.h> #include <sys/shm.h> @@ -134,15 +137,35 @@ static ino_t do_test(struct test_entry * return get_addr_inode(te->data); } +void signal_handler(int signum, siginfo_t *si, void *context) +{ + verbose_printf("Process %d got a Segmentation Fault at address %p\n", + getpid(), si->si_addr); + exit(RC_FAIL); +} + +static struct sigaction sa = { + .sa_sigaction = signal_handler, + .sa_flags = SA_SIGINFO, +}; + int main(int argc, char *argv[], char *envp[]) { int i; int shmid; ino_t *shm; int num_sharings; + int status; + int child_failed = 0; + int ret; test_init(argc, argv); + ret = sigaction(SIGSEGV, &sa, NULL); + if (ret < 0) + FAIL("Installing SIGSEGV handler failed: %s", + strerror(errno)); + if (argc == 2) { /* * first process @@ -150,7 +173,7 @@ int main(int argc, char *argv[], char *e */ char *env; pid_t *children; - int ret, j; + int j; /* both default to 0 */ int sharing = 0, elfmap_off = 0; @@ -216,12 +239,28 @@ int main(int argc, char *argv[], char *e } } for (i = 0; i < num_sharings; i++) { - ret = waitpid(children[i], NULL, 0); + ret = waitpid(children[i], &status, 0); if (ret < 0) { shmctl(shmid, IPC_RMID, NULL); shmdt(shm); FAIL("waitpid failed: %s", strerror(errno)); } + if (WIFEXITED(status) && WEXITSTATUS(status) != 0) { + child_failed++; + verbose_printf("Child %d exited with non-zero status: %d\n", + i + 1, WEXITSTATUS(status)); + } + if (WIFSIGNALED(status)) { + child_failed++; + verbose_printf("Child %d killed by signal: %s\n", i + 1, + strsignal(WTERMSIG(status))); + } + } + if (child_failed) { + shmctl(shmid, IPC_RMID, NULL); + shmdt(shm); + FAIL("%d of %d children exited abnormally", + child_failed, num_sharings); } for (i = 0; i < NUM_TESTS; i++) { ino_t base = shm[i]; [2] HUGETLB_SHARE=2 xB.linkshare 2 (32): PASS HUGETLB_SHARE=2 xB.linkshare 2 (64): PASS HUGETLB_SHARE=1 xB.linkshare 2 (32): FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xB.linkshare 2 (64): FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=2 xBDT.linkshare 2 (32): PASS HUGETLB_SHARE=2 xBDT.linkshare 2 (64): FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xBDT.linkshare 2 (32): FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xBDT.linkshare 2 (64): FAIL 2 of 2 children exited abnormally [3] [92721.340866] xBDT.linkshare[23592]: segfault at 00002b0f78891b40 rip 00002b0f78891b40 rsp 00007fff5db771f8 error 14 [92721.342742] xBDT.linkshare[23593]: segfault at 00002b0f78891b40 rip 00002b0f78891b40 rsp 00007fffdad02388 error 14 [4] HUGETLB_SHARE=2 xB.linkshare 2 (32): libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x9000000-0x9010048 (filesz=0) (prot = 0x7) libhugetlbfs: Got unpopulated shared fd -- Preparing libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0 bytes from 0x9000000... ...done libhugetlbfs: Copying extra 0x8 bytes from 0x9000000... ...done libhugetlbfs: Prepare succeeded Starting testcase "xB.linkshare", pid 23827 Segment remapping enabled, sharing = 2 libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x9000000-0x9010048 (filesz=0) (prot = 0x7) libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x9000000-0x9010048 (filesz=0) (prot = 0x7) Starting testcase "xB.linkshare", pid 23828 Starting testcase "xB.linkshare", pid 23829 PASS HUGETLB_SHARE=2 xB.linkshare 2 (64): libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x1000000-0x1010050 (filesz=0) (prot = 0x7) libhugetlbfs: Got unpopulated shared fd -- Preparing libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0 bytes from 0x1000000... ...done libhugetlbfs: Copying extra 0x10 bytes from 0x1000000... ...done libhugetlbfs: Prepare succeeded Starting testcase "xB.linkshare", pid 23836 Segment remapping enabled, sharing = 2 libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x1000000-0x1010050 (filesz=0) (prot = 0x7) libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x1000000-0x1010050 (filesz=0) (prot = 0x7) Child 1 killed by signal: Segmentation fault Child 2 killed by signal: Segmentation fault FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xB.linkshare 2 (32): libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x9000000-0x9010048 (filesz=0) (prot = 0x7) libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0 bytes from 0x9000000... ...done libhugetlbfs: Copying extra 0x8 bytes from 0x9000000... ...done libhugetlbfs: Prepare succeeded Starting testcase "xB.linkshare", pid 23857 Segment remapping enabled, sharing = 1 libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr libhugetlbfs: Hugepage segment 0 (phdr 4): 0x9000000-0x9010048 (filesz=0) (prot = 0x7) libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0 bytes from 0x9000000... ...done libhugetlbfs: Copying extra 0x8 bytes from 0x9000000... libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0 bytes from 0x9000000... ...done libhugetlbfs: Copying extra 0x8 bytes from 0x9000000... ...done libhugetlbfs: Prepare succeeded ...done libhugetlbfs: Prepare succeeded Starting testcase "xB.linkshare", pid 23858 Process 23858 got a Segmentation Fault at address 0x5556f008 Starting testcase "xB.linkshare", pid 23859 Process 23859 got a Segmentation Fault at address 0x5556f028 Child 1 exited with non-zero status: 2 Child 2 exited with non-zero status: 2 FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xB.linkshare 2 (64): libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x1000000-0x1010050 (filesz=0) (prot = 0x7) libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0 bytes from 0x1000000... ...done libhugetlbfs: Copying extra 0x10 bytes from 0x1000000... ...done libhugetlbfs: Prepare succeeded Starting testcase "xB.linkshare", pid 23872 Segment remapping enabled, sharing = 1 libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x1000000-0x1010050 (filesz=0) (prot = 0x7) libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 4): 0x1000000-0x1010050 (filesz=0) (prot = 0x7) libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0 bytes from 0x1000000... ...done libhugetlbfs: Copying extra 0x10 bytes from 0x1000000... libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0 bytes from 0x1000000... ...done libhugetlbfs: Copying extra 0x10 bytes from 0x1000000... ...done libhugetlbfs: Prepare succeeded ...done libhugetlbfs: Prepare succeeded Starting testcase "xB.linkshare", pid 23873 Process 23873 got a Segmentation Fault at address 0x2abb87143010 Child 1 exited with non-zero status: 2 Starting testcase "xB.linkshare", pid 23874 Process 23874 got a Segmentation Fault at address 0x2b03ebc73050 Child 2 exited with non-zero status: 2 FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=2 xBDT.linkshare 2 (32): libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x8000000-0x8012198 (filesz=0x12198) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x9000000-0x90203a8 (filesz=0x10350) (prot = 0x7) libhugetlbfs: Got unpopulated shared fd -- Preparing libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0x12198 bytes from 0x8000000... ...done libhugetlbfs: Prepare succeeded libhugetlbfs: Got unpopulated shared fd -- Preparing libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0x10350 bytes from 0x9000000... ...done libhugetlbfs: Copying extra 0x8 bytes from 0x9010360... ...done libhugetlbfs: Prepare succeeded Starting testcase "xBDT.linkshare", pid 23893 Segment remapping enabled, sharing = 2 libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x8000000-0x8012198 (filesz=0x12198) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x9000000-0x90203a8 (filesz=0x10350) (prot = 0x7) libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x8000000-0x8012198 (filesz=0x12198) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x9000000-0x90203a8 (filesz=0x10350) (prot = 0x7) Starting testcase "xBDT.linkshare", pid 23894 Starting testcase "xBDT.linkshare", pid 23895 PASS HUGETLB_SHARE=2 xBDT.linkshare 2 (64): libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x1000000-0x101266c (filesz=0x1266c) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x2000000-0x2020590 (filesz=0x1053c) (prot = 0x7) libhugetlbfs: Got unpopulated shared fd -- Preparing libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0x1266c bytes from 0x1000000... ...done libhugetlbfs: Prepare succeeded libhugetlbfs: Got unpopulated shared fd -- Preparing libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0x1053c bytes from 0x2000000... ...done libhugetlbfs: Copying extra 0x10 bytes from 0x2010540... ...done libhugetlbfs: Prepare succeeded Starting testcase "xBDT.linkshare", pid 23902 Segment remapping enabled, sharing = 2 libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x1000000-0x101266c (filesz=0x1266c) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x2000000-0x2020590 (filesz=0x1053c) (prot = 0x7) libhugetlbfs: HUGETLB_SHARE=2, sharing enabled for all segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x1000000-0x101266c (filesz=0x1266c) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x2000000-0x2020590 (filesz=0x1053c) (prot = 0x7) Child 1 killed by signal: Segmentation fault Child 2 killed by signal: Segmentation fault FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xBDT.linkshare 2 (32): libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x8000000-0x8012198 (filesz=0x12198) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x9000000-0x90203a8 (filesz=0x10350) (prot = 0x7) libhugetlbfs: Got unpopulated shared fd -- Preparing libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0x12198 bytes from 0x8000000... ...done libhugetlbfs: Prepare succeeded libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0x10350 bytes from 0x9000000... ...done libhugetlbfs: Copying extra 0x8 bytes from 0x9010360... ...done libhugetlbfs: Prepare succeeded Starting testcase "xBDT.linkshare", pid 23923 Segment remapping enabled, sharing = 1 libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x8000000-0x8012198 (filesz=0x12198) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x9000000-0x90203a8 (filesz=0x10350) (prot = 0x7) libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0x10350 bytes from 0x9000000... libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x8000000-0x8012198 (filesz=0x12198) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x9000000-0x90203a8 (filesz=0x10350) (prot = 0x7) libhugetlbfs: Mapped hugeseg at 0x55800000. Copying 0x10350 bytes from 0x9000000... ...done libhugetlbfs: Copying extra 0x8 bytes from 0x9010360... ...done libhugetlbfs: Prepare succeeded Starting testcase "xBDT.linkshare", pid 23925 Process 23925 got a Segmentation Fault at address 0x5556f020 ...done libhugetlbfs: Copying extra 0x8 bytes from 0x9010360... ...done libhugetlbfs: Prepare succeeded Starting testcase "xBDT.linkshare", pid 23924 Process 23924 got a Segmentation Fault at address 0x5556f000 Child 1 exited with non-zero status: 2 Child 2 exited with non-zero status: 2 FAIL 2 of 2 children exited abnormally HUGETLB_SHARE=1 xBDT.linkshare 2 (64): libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x1000000-0x101266c (filesz=0x1266c) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x2000000-0x2020590 (filesz=0x1053c) (prot = 0x7) libhugetlbfs: Got unpopulated shared fd -- Preparing libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0x1266c bytes from 0x1000000... ...done libhugetlbfs: Prepare succeeded libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0x1053c bytes from 0x2000000... ...done libhugetlbfs: Copying extra 0x10 bytes from 0x2010540... ...done libhugetlbfs: Prepare succeeded Starting testcase "xBDT.linkshare", pid 23938 Segment remapping enabled, sharing = 1 libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x1000000-0x101266c (filesz=0x1266c) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x2000000-0x2020590 (filesz=0x1053c) (prot = 0x7) libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0x1053c bytes from 0x2000000... libhugetlbfs: HUGETLB_SHARE=1, sharing enabled for only read-only segments libhugetlbfs: Hugepage segment 0 (phdr 2): 0x1000000-0x101266c (filesz=0x1266c) (prot = 0x5) libhugetlbfs: Hugepage segment 1 (phdr 3): 0x2000000-0x2020590 (filesz=0x1053c) (prot = 0x7) libhugetlbfs: Mapped hugeseg at 0x2aaaaac00000. Copying 0x1053c bytes from 0x2000000... ...done libhugetlbfs: Copying extra 0x10 bytes from 0x2010540... ...done libhugetlbfs: Prepare succeeded Starting testcase "xBDT.linkshare", pid 23940 Process 23940 got a Segmentation Fault at address 0x2b33f127f040 ...done libhugetlbfs: Copying extra 0x10 bytes from 0x2010540... ...done libhugetlbfs: Prepare succeeded Starting testcase "xBDT.linkshare", pid 23939 Process 23939 got a Segmentation Fault at address 0x2b8f03d29000 Child 1 exited with non-zero status: 2 Child 2 exited with non-zero status: 2 FAIL 2 of 2 children exited abnormally [5] [92773.453097] xB.linkshare[23838]: segfault at 00002b980301fb88 rip 00002ae67e48f315 rsp 00007fff2c9856d0 error 4 [92773.456395] xB.linkshare[23837]: segfault at 00002b980301fb88 rip 00002b3e89b2f315 rsp 00007fff212e7030 error 4 [92779.769836] xBDT.linkshare[23903]: segfault at 00002b7604c72b40 rip 00002b7604c72b40 rsp 00007fff5df255b8 error 14 [92779.772422] xBDT.linkshare[23904]: segfault at 00002b7604c72b40 rip 00002b7604c72b40 rsp 00007fff479fd098 error 14 -- Nishanth Aravamudan <[EMAIL PROTECTED]> IBM Linux Technology Center ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel