Jonas, that particular error is on the decline. Many went away when we
rolled out a series of a fixes to run the tests on devices. The error
itself was a symptom of a different issue. I would imagine that the ones
that we still see occurring are, likely, also not directly related to
sockit-to-me.

Even though this is the case, we recognize that synchronous tcp socket
usage isn't ideal (we didn't think it was in the first place, necessarily,
it was just the best way to make the tests easy to write).

FFWD to now, we're adding a promise based tcp driver for marionette which
will enable new tests to be written using promises. Marionette calls would
always return a promise which you could .then() to do something else. It's
a much nicer, and standardized pattern.

Note that we *WILL* be seeking *ALL OF YOUR HELP* to port existing tests to
the new driver. There are simply too many for any single team to handle.

FYI, Andre Natal is working on the new tcp driver for marionette.

On Tue, Nov 17, 2015 at 4:35 PM, Jonas Sicking <[email protected]> wrote:

> Jumping in on an old thread here.
>
> I 100% agree that getting rid of the intermittent failures is really
> important. Especially the retry-three-times thing that we are doing.
>
> One really important problem that we need to solve is that we have a
> test harness problem which is causing the socket that marionette uses
> to sometimes disconnect.
>
> This means that any test can and does fail intermittently. And fairly
> often as I understand it.
>
> At the very least we should detect that this is the problem and rerun
> the test. (This is ok since the broken socket is a marionette bug and
> not a product bug). But even better is of course to find the source of
> this disconnect and fix it.
>
> Many have tried to find and fix this problem, but it's hard since it
> only reproduces intermittently. One possible approach would be to try
> to catch this in rr. I don't think that has been tried yet.
>
> / Jonas
>
>
> On Wed, Nov 4, 2015 at 7:39 AM, Michael Henretty <[email protected]>
> wrote:
> > Hi Gaia Folk,
> >
> > If you've been doing Gaia core work for any length of time, you are
> probably
> > aware that we have *many* intermittent Gij test failures on Treeherder
> [1].
> > But the problem is even worse than you may know! You see, each Gij test
> is
> > run 5 times within a test chunk (g. Gij4) before it is marked as failing.
> > Then that chunk itself is retried up to 5 times before the whole thing is
> > marked as failing. This means that for a test to be marked as "passing,"
> it
> > only has to run successfully once in 25 times. I'm not kidding. Our retry
> > logic, especially those inside the test chunk, make it hard to know which
> > intermittent tests are our worst offenders. This is bad.
> >
> > My suggestion is to stop doing the retries inside the chunks. That way,
> the
> > failures will at least surface on Treeherder, which means we can star
> more
> > test, which means we'll have a lot more visibility on the bad
> intermittents.
> > Sheriffs will complain a lot, so we have to be ready to act on these
> bugs.
> > But the alternative is that we continue to write tests with a low
> "raciness"
> > bar which, IMO, have a much lower chance of catching regressions. The
> longer
> > we wait, the worse this problem becomes.
> >
> > Thoughts?
> >
> > Thanks,
> > Michael
> >
> > 1.)
> >
> https://bugzilla.mozilla.org/buglist.cgi?keywords=intermittent-failure&keywords_type=allwords&list_id=12657856&resolution=---&query_format=advanced&product=Firefox%20OS
> >
> > _______________________________________________
> > dev-fxos mailing list
> > [email protected]
> > https://lists.mozilla.org/listinfo/dev-fxos
> >
> _______________________________________________
> dev-fxos mailing list
> [email protected]
> https://lists.mozilla.org/listinfo/dev-fxos
>
_______________________________________________
dev-fxos mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-fxos

Reply via email to