On 28.11.2006 [16:07:41 -0600], Adam Litke wrote: > On Tue, 2006-11-28 at 13:49 -0800, Nishanth Aravamudan wrote: > > On 16.11.2006 [10:50:13 +1100], David Gibson wrote: > > > On Wed, Nov 15, 2006 at 01:09:07PM -0800, Nishanth Aravamudan wrote: > > > > On 15.11.2006 [10:41:19 +1100], David Gibson wrote: > > > > > On Tue, Nov 14, 2006 at 02:33:59PM -0800, Nishanth Aravamudan wrote: > > > > > > On 14.11.2006 [12:21:36 -0800], Nishanth Aravamudan wrote: > > > > > > > Hi all, > > > > > > > > > > > > > > I'm hitting a brick wall debugging the linkshare segfaults I'm > > > > > > > seeing. > > > > > > > > > > > > > > (These logs are from my 2-way x86_64, but I'm seeing similar > > > > > > > issues on a G5 > > > > > > > (ppc64): > > > > > > > > > > > > > > HUGETLB_SHARE=2 xB.linkshare 2 (32): PASS > > > > > > > HUGETLB_SHARE=2 xB.linkshare 2 (64): PASS > > > > > > > HUGETLB_SHARE=1 xB.linkshare 2 (32): FAIL 2 of 2 children > > > > > > > exited abnormally > > > > > > > HUGETLB_SHARE=1 xB.linkshare 2 (64): FAIL 2 of 2 children > > > > > > > exited abnormally > > > > > > > HUGETLB_SHARE=2 xBDT.linkshare 2 (32): PASS > > > > > > > HUGETLB_SHARE=2 xBDT.linkshare 2 (64): PASS > > > > > > > HUGETLB_SHARE=1 xBDT.linkshare 2 (32): FAIL 2 of 2 children > > > > > > > exited abnormally > > > > > > > HUGETLB_SHARE=1 xBDT.linkshare 2 (64): FAIL 2 of 2 children > > > > > > > exited abnormally > > > > > > > > > > > > > > With all 4 failures being segmentation faults we caught. > > > > > > > > > > > > /me hangs head in shame. > > > > > > > > > > > > This is all probably just a stupid programming error on my part. > > > > > > I'll > > > > > > have a fix, I think, once I return from class. > > > > > > > > > > Btw, some of the existing testcases (e.g. alloc-instantiate-race) use > > > > > strsignal() and WTERMSIG() to give a more informative message when a > > > > > child is killed by a signal. It's probably a good idea to use that > > > > > here too, so you can see they died with a SEGV at first glance. > > > > > > > > Yes, this is done with a verbose test run. If I were to do it via a > > > > FAIL statement, we'd get 3 FAIL lines for every failing case. I > > > > suppose I could add a FAIL_CONT() for this... > > > > > > Um.. I don't really follow you. > > > > If you look at the patch I sent previously, we do print out the signal > > information with strsignal, but via verbose_printf(). If I were to do so > > via a FAIL() line, we'd either only print out the signal for the first > > child (since the testcase would fail immediately), or we'd have to add a > > FAIL_CONT() or something to allow me to indicate failure without failing > > the testcase immediately. > > > > In any case, I've spent a good amount of time cleaning and fixing the > > linkshare testcase yesterday and today. Here is what I have so far. We > > are still getting segfaults on xBDT.linkshare 64-bit with > > HUGETLB_SHARE=2, but I went and checked and it's not a testcase issue, I > > don't think. We also will segfault, for instance, if xBDT.linkhuge > > 64-bit is run manually with HUGETLB_SHARE=2 two times in a row. I am now > > looking into the root cause of this failure.
I also should have mentioned that I quickly checked with the old sharing code to see if xBDT.linkshare (or xBDT.linkhuge) with HUGETLB_SHARE=2 worked ever and it would appear that it did not (64-bit only, still). So we're no worse off than before, but it's unclear to me still why it's failing at all in that particular combination. > > The patch is pretty much ready for inclusion, I think, but I'd like one > > more round of review. > > It's definitely a lot cleaner looking. Seems to me we can apply the > same logic that we did to the daemon removal code. If this updated > version yields the same test results and is cleaner, let's merge it > and continue working on it in-tree. Yup, it does have the same results as far as PASS/FAIL. And the logic remains unchanged except for a few small things I'm adding comments for as we speak. I will do some more testing, and then check it into my local tree once others have a change to review. Thanks, Nish -- Nishanth Aravamudan <[EMAIL PROTECTED]> IBM Linux Technology Center ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Libhugetlbfs-devel mailing list Libhugetlbfs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel