On 28.11.2006 [16:07:41 -0600], Adam Litke wrote:
> On Tue, 2006-11-28 at 13:49 -0800, Nishanth Aravamudan wrote:
> > On 16.11.2006 [10:50:13 +1100], David Gibson wrote:
> > > On Wed, Nov 15, 2006 at 01:09:07PM -0800, Nishanth Aravamudan wrote:
> > > > On 15.11.2006 [10:41:19 +1100], David Gibson wrote:
> > > > > On Tue, Nov 14, 2006 at 02:33:59PM -0800, Nishanth Aravamudan wrote:
> > > > > > On 14.11.2006 [12:21:36 -0800], Nishanth Aravamudan wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'm hitting a brick wall debugging the linkshare segfaults I'm 
> > > > > > > seeing.
> > > > > > >
> > > > > > > (These logs are from my 2-way x86_64, but I'm seeing similar 
> > > > > > > issues on a G5
> > > > > > > (ppc64):
> > > > > > >
> > > > > > > HUGETLB_SHARE=2 xB.linkshare 2 (32):    PASS
> > > > > > > HUGETLB_SHARE=2 xB.linkshare 2 (64):    PASS
> > > > > > > HUGETLB_SHARE=1 xB.linkshare 2 (32):    FAIL    2 of 2 children 
> > > > > > > exited abnormally
> > > > > > > HUGETLB_SHARE=1 xB.linkshare 2 (64):    FAIL    2 of 2 children 
> > > > > > > exited abnormally
> > > > > > > HUGETLB_SHARE=2 xBDT.linkshare 2 (32):  PASS
> > > > > > > HUGETLB_SHARE=2 xBDT.linkshare 2 (64):  PASS
> > > > > > > HUGETLB_SHARE=1 xBDT.linkshare 2 (32):  FAIL    2 of 2 children 
> > > > > > > exited abnormally
> > > > > > > HUGETLB_SHARE=1 xBDT.linkshare 2 (64):  FAIL    2 of 2 children 
> > > > > > > exited abnormally
> > > > > > >
> > > > > > > With all 4 failures being segmentation faults we caught.
> > > > > >
> > > > > > /me hangs head in shame.
> > > > > >
> > > > > > This is all probably just a stupid programming error on my part. 
> > > > > > I'll
> > > > > > have a fix, I think, once I return from class.
> > > > >
> > > > > Btw, some of the existing testcases (e.g. alloc-instantiate-race) use
> > > > > strsignal() and WTERMSIG() to give a more informative message when a
> > > > > child is killed by a signal.  It's probably a good idea to use that
> > > > > here too, so you can see they died with a SEGV at first glance.
> > > >
> > > > Yes, this is done with a verbose test run. If I were to do it via a
> > > > FAIL statement, we'd get 3 FAIL lines for every failing case. I
> > > > suppose I could add a FAIL_CONT() for this...
> > >
> > > Um.. I don't really follow you.
> >
> > If you look at the patch I sent previously, we do print out the signal
> > information with strsignal, but via verbose_printf(). If I were to do so
> > via a FAIL() line, we'd either only print out the signal for the first
> > child (since the testcase would fail immediately), or we'd have to add a
> > FAIL_CONT() or something to allow me to indicate failure without failing
> > the testcase immediately.
> >
> > In any case, I've spent a good amount of time cleaning and fixing the
> > linkshare testcase yesterday and today. Here is what I have so far. We
> > are still getting segfaults on xBDT.linkshare 64-bit with
> > HUGETLB_SHARE=2, but I went and checked and it's not a testcase issue, I
> > don't think. We also will segfault, for instance, if xBDT.linkhuge
> > 64-bit is run manually with HUGETLB_SHARE=2 two times in a row. I am now
> > looking into the root cause of this failure.

I also should have mentioned that I quickly checked with the old sharing
code to see if xBDT.linkshare (or xBDT.linkhuge) with HUGETLB_SHARE=2
worked ever and it would appear that it did not (64-bit only, still). So
we're no worse off than before, but it's unclear to me still why it's
failing at all in that particular combination.

> > The patch is pretty much ready for inclusion, I think, but I'd like one
> > more round of review.
> 
> It's definitely a lot cleaner looking.  Seems to me we can apply the
> same logic that we did to the daemon removal code.  If this updated
> version yields the same test results and is cleaner, let's merge it
> and continue working on it in-tree.

Yup, it does have the same results as far as PASS/FAIL. And the logic
remains unchanged except for a few small things I'm adding comments for
as we speak.

I will do some more testing, and then check it into my local tree once
others have a change to review.

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to