On Tue, 2006-11-28 at 13:49 -0800, Nishanth Aravamudan wrote: > On 16.11.2006 [10:50:13 +1100], David Gibson wrote: > > On Wed, Nov 15, 2006 at 01:09:07PM -0800, Nishanth Aravamudan wrote: > > > On 15.11.2006 [10:41:19 +1100], David Gibson wrote: > > > > On Tue, Nov 14, 2006 at 02:33:59PM -0800, Nishanth Aravamudan wrote: > > > > > On 14.11.2006 [12:21:36 -0800], Nishanth Aravamudan wrote: > > > > > > Hi all, > > > > > > > > > > > > I'm hitting a brick wall debugging the linkshare segfaults I'm > > > > > > seeing. > > > > > > > > > > > > (These logs are from my 2-way x86_64, but I'm seeing similar issues > > > > > > on a G5 > > > > > > (ppc64): > > > > > > > > > > > > HUGETLB_SHARE=2 xB.linkshare 2 (32): PASS > > > > > > HUGETLB_SHARE=2 xB.linkshare 2 (64): PASS > > > > > > HUGETLB_SHARE=1 xB.linkshare 2 (32): FAIL 2 of 2 children > > > > > > exited abnormally > > > > > > HUGETLB_SHARE=1 xB.linkshare 2 (64): FAIL 2 of 2 children > > > > > > exited abnormally > > > > > > HUGETLB_SHARE=2 xBDT.linkshare 2 (32): PASS > > > > > > HUGETLB_SHARE=2 xBDT.linkshare 2 (64): PASS > > > > > > HUGETLB_SHARE=1 xBDT.linkshare 2 (32): FAIL 2 of 2 children > > > > > > exited abnormally > > > > > > HUGETLB_SHARE=1 xBDT.linkshare 2 (64): FAIL 2 of 2 children > > > > > > exited abnormally > > > > > > > > > > > > With all 4 failures being segmentation faults we caught. > > > > > > > > > > /me hangs head in shame. > > > > > > > > > > This is all probably just a stupid programming error on my part. I'll > > > > > have a fix, I think, once I return from class. > > > > > > > > Btw, some of the existing testcases (e.g. alloc-instantiate-race) use > > > > strsignal() and WTERMSIG() to give a more informative message when a > > > > child is killed by a signal. It's probably a good idea to use that > > > > here too, so you can see they died with a SEGV at first glance. > > > > > > Yes, this is done with a verbose test run. If I were to do it via a > > > FAIL statement, we'd get 3 FAIL lines for every failing case. I > > > suppose I could add a FAIL_CONT() for this... > > > > Um.. I don't really follow you. > > If you look at the patch I sent previously, we do print out the signal > information with strsignal, but via verbose_printf(). If I were to do so > via a FAIL() line, we'd either only print out the signal for the first > child (since the testcase would fail immediately), or we'd have to add a > FAIL_CONT() or something to allow me to indicate failure without failing > the testcase immediately. > > In any case, I've spent a good amount of time cleaning and fixing the > linkshare testcase yesterday and today. Here is what I have so far. We > are still getting segfaults on xBDT.linkshare 64-bit with > HUGETLB_SHARE=2, but I went and checked and it's not a testcase issue, I > don't think. We also will segfault, for instance, if xBDT.linkhuge > 64-bit is run manually with HUGETLB_SHARE=2 two times in a row. I am now > looking into the root cause of this failure. > > The patch is pretty much ready for inclusion, I think, but I'd like one > more round of review.
It's definitely a lot cleaner looking. Seems to me we can apply the same logic that we did to the daemon removal code. If this updated version yields the same test results and is cleaner, let's merge it and continue working on it in-tree. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel