On Tue, Nov 28, 2006 at 09:56:51PM -0800, Nishanth Aravamudan wrote: [snip] > > I don't really see that failing after the first child comes back with > > a signal is a problem. I don't see any situations where the return > > code from the second child will be vital for debugging. > > Ok, I can change it easily enough. > > > In any case you mustn't call FAIL() directly from the child process. > > While I agree this is messy in principle, I don't see any reason that it > can't be called. It's simply a wrapper for exit()? I see that it calls > cleanup() and there is a claim that cleanup() must be defined in the > tests but this is untrue AFAICT.
If you FAIL() from the child you'll get two separate fail messages which makes a mess of the output. Easiest way to make sure there's a single, comprehensible fail message is to only ever FAIL() (or PASS() or CONFIG()) from the top level process. > > > In any case, I've spent a good amount of time cleaning and fixing the > > > linkshare testcase yesterday and today. Here is what I have so far. We > > > are still getting segfaults on xBDT.linkshare 64-bit with > > > HUGETLB_SHARE=2, but I went and checked and it's not a testcase issue, I > > > don't think. We also will segfault, for instance, if xBDT.linkhuge > > > 64-bit is run manually with HUGETLB_SHARE=2 two times in a row. I am now > > > looking into the root cause of this failure. > > > > What sort of system is this on? I've certainly had 64-bit > > xBDT.linkshare work on some setups. > > This is with the *reworked* linkshare. The version currently in the > library is broken in multiple ways. I thought you said you got failures with both the old and new versions.. > > > The patch is pretty much ready for inclusion, I think, but I'd like one > > > more round of review. > > > > Looks like a nice cleanup, but some further comments below. > > <snip> > > > > +static int child_process(char *self, int num_sharings, int index) > > > { > > > int i; > > > - int shmid; > > > - ino_t *shm; > > > + ino_t ino; > > > + > > > + get_link_string(self); > > > + > > > + shmid = shmget(SHM_KEY, num_sharings * NUM_TESTS * > > > + sizeof(ino_t), 0666); > > > + if (shmid < 0) > > > + FAIL("Child's shmget failed: %s", strerror(errno)); > > > > As mentioned above, a direct FAIL() from the child will always cause > > trouble - what will the parent do. Instead just do an exit() with > > non-zero code, and let the parent do the FAIL() when it gets the bad > > code back from wait(). > > The parent will see a non-zero return status and FAIL itself. FAIL() is > a wrapper for exit with non-zero code, so I'm not sure what the > difference is. Confusing multiple messages. > > > + > > > + shm = shmat(shmid, NULL, 0); > > > + if (shm == (void *)-1) > > > + FAIL("shmat failed: %s", strerror(errno)); > > > > And again. > > I can make these changes to verbose_printf() and exit(RC_FAIL), but I'm > not sure what the difference is. > > > > + for (i = 0; i < NUM_TESTS; i++) { > > > + if (!test_addr_huge(testtab + i)) { > > > + shm[index * NUM_TESTS + i] = 0; > > > + } else { > > > + ino = do_test(testtab + i); > > > + if ((int)ino < 0) { > > > + shmdt(shm); > > > + exit(RC_FAIL); > > > + } > > > + shm[index * NUM_TESTS + i] = ino; > > > + } > > > + } > > > + shmdt(shm); > > > > Man I hate SysV shm. I'm almost certain this communication page stuff > > would be less ugly if we did mmap()s directly. Especially if we used > > an unlinked file, and passed the fd directly through to the children > > by clearing FD_CLOEXEC. > > I was trying to minimize functional changes (beyond fixes) in this > version. You had mentioned the cleanup of a mmap()'d unlinked file > before and I added it to my todo then. But I'm trying to minimize > changes so that things stay working. Fair enough. > > > - children = (pid_t *)malloc(num_sharings * sizeof(pid_t)); > > > - if (!children) > > > + children_pids = (pid_t *)malloc(num_sharings * sizeof(pid_t)); > > > + if (!children_pids) > > > FAIL("malloc failed: %s", strerror(errno)); > > > > Is there really any advantage to having a variable number of children. > > Things could be simplified considerably if we just cut this down to > > the minimal testcase of 2 children, hardcoded. > > I don't know about "considerably", but fair enough. If no one else has > any objections, I can make this change. Note, that it may also be > interesting to run x{B,BDT}.linkhuge with HUGETLB_SHARE set > appropriately, consecutively, as it results in the same sort of > thing That sounds like a good idea. Makes sure HUGETLB_SHARE doesn't mess anything up, even in a case which doesn't use sharing. > (but would only be a "Did it PASS()?" kind of test). Um.. there's another kind? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel