The mismatch between the synchronous nature of Marionette tests and the
asynchronous nature of the code we want to test just begs for race
conditions. And clearly we've got a lot.

But I agree with jst that we can't make excuses here. We've got to fix the
tests.

And, going forward, we've got to figure out a better, less error-prone, way
to write them.

As we fix them, let's pay attention to what the underlying issues are. I
predict that we'll find a handful of common errors repeated over and over
again. Maybe we can improve the Marionette API to make them less common.
(Can we improve things with Promises?) Or at least we could end up with
some kind of "HOWTO write Marionette tests that are not racy" best
practices guide.  It would be nice, for example, if we had a naming
convention for functions that would help test writers distinguish those
that block until some condition is true from those that do not block.

David S has set us the task of converting our python Marionette tests to
JS.  Maybe we can try to get a handle on the raciness issues as part of
that conversion. It would be nice if there was some way to ensure that this
new batch of tests we'll be writing will not have the automatic retry that
the existing tests have.

  David

On Wed, Nov 4, 2015 at 8:10 AM, Johnny Stenback <[email protected]> wrote:

> On Wed, Nov 4, 2015 at 7:48 AM, Michael Henretty <[email protected]>
> wrote:
> >
> > On Wed, Nov 4, 2015 at 4:45 PM, Fabrice DesrĂ© <[email protected]>
> wrote:
> >>
> >> Can we *right now* identify the worst offenders by looking at the tests
> >> results/re-runs? You know that sheriffs will very quickly hide and
> >> ignore tests that are really flaky.
> >
> >
> >
> > Yes, that's an important point. The problem is that you have to actually
> > look at the logs of an individual chunk to see which tests failed. If a
> > certain Gij test passes at least 1 out of it's 5 given runs, it will not
> > surface to Treeherder, which means we can't start it. Looking through
> each
> > chunk log file (of which we have 40 per run) is doable, but more time
> > consuming and error prone.
>
> Jumping in on something I haven't been able to pay much attention to
> myself here so I may be missing context here, but this sounds like it
> sets people up to assume that if something occasionally works we're
> good to ship it, as opposed to if it occasionally fails we need to fix
> it. Seems to me that this needs to be flipped around very aggressively
> for these tests to provide much value.
>
> - jst
> _______________________________________________
> dev-fxos mailing list
> [email protected]
> https://lists.mozilla.org/listinfo/dev-fxos
>
_______________________________________________
dev-fxos mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-fxos

Reply via email to