Re: Exit code -11 must die

Randell Jesup Mon, 29 Feb 2016 09:30:27 -0800

>On 2/27/2016 9:06 PM, Randell Jesup wrote:
>> months until recently it popped up a bit).  Note that this failure
>> *never* results in a crashdump, and I've never seen it locally, just in
>> Automation.
>
>What we do know:
>
> * Exit code -11 is evidence a SIGSEGV (crash).
>
>This I don't know, but somebody may know (+ted):
>
> * Are we sure that the crash is happening in firefox.exe? Or is it
>   possible that some other process is crashing and taking down our
>   test harness with it?
> * Can somebody point to exactly what line of code in the test harness
>   collects the -11 code?


See Andrew's comments; I don't know

> * Is there no crash dump because the crash reporter is turned off?
>     o If it's turned on, is the crash reporter itself crashing? Or is
>       the test harness not picking up the crash dump?

I doubt it's turned off; but it may for some reason be unable to catch it.

>> We *need* to find some solution to it -- even if it's to decide it's a
>> (safe) artifact of some underlying problem outside of our control.
>
>Is "we" you? Are you asking somebody else to help you with this, or own the
>problem completely?

I need help here, or preferably someone to own this, as I'm out of my
area of expertise trying to get stuff to run and debug it on the Try
VMs.  I have a loaner; my (very painful) attempts to run the same tests
with same packages there haven't reproed it.

Ryan will confirm that this has been messing with us all over the tree
heavily for at least a year.

It may be that they're all valid Shutdown bugs that get tagged to
"whichever test ran last" - if so fine, but we need *some* way to get
useful info out about what/why/where it crashed.

>> I'd far rather find a true cause and either fix or wallpaper it.  But right
>> now it's stopping me from landing some important code changes.
>>
>> On the plus side, I have a nice Try run which will cause it 100% of the
>> time - though when I tried to provoke it on a loaner Test VM after
>> painfully emulating what's needed to run tests, it wouldn't fail -- but
>> I don't trust that was a well-setup recreation of a real Try run.
>>
>> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2eb01359621
>>
>IIRC, there was recently a post about how you can submit a try job and have
>the VM stay alive afterwards for postmortem debugging. I don't
>remember/can't find the details right now

I think that's only for TaskCluster jobs; these aren't.

>Can we also submit a try job with rr enabled, and get a recording of the
>failure? That combination could lead to a pretty quick cause diagnosis of
>this, since it's Linux.

I'd love if we could do that, but rr means I *must* get into the machine
or a clone thereof to run rr replay.

>Also, does this failure happen if you disable all the tests except for the
>one which is permafailing
>(dom/media/tests/mochitest/identity/test_setIdentityProviderWithErrors.html)?
>If so, that would make it easier to record and debug.

I've tried disabling the failing test, and it just moves to a previous
test, and disabled that and it moved again.  It is most certainly
permafailing, but it's also clearly touchy timing-wise, so disabling too
much may make it disappear.  May be worth trying, but has a really long
cycle time.

Thanks!
-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform

Re: Exit code -11 must die

Reply via email to