The mismatch between the synchronous nature of Marionette tests and the asynchronous nature of the code we want to test just begs for race conditions. And clearly we've got a lot.
But I agree with jst that we can't make excuses here. We've got to fix the tests. And, going forward, we've got to figure out a better, less error-prone, way to write them. As we fix them, let's pay attention to what the underlying issues are. I predict that we'll find a handful of common errors repeated over and over again. Maybe we can improve the Marionette API to make them less common. (Can we improve things with Promises?) Or at least we could end up with some kind of "HOWTO write Marionette tests that are not racy" best practices guide. It would be nice, for example, if we had a naming convention for functions that would help test writers distinguish those that block until some condition is true from those that do not block. David S has set us the task of converting our python Marionette tests to JS. Maybe we can try to get a handle on the raciness issues as part of that conversion. It would be nice if there was some way to ensure that this new batch of tests we'll be writing will not have the automatic retry that the existing tests have. David On Wed, Nov 4, 2015 at 8:10 AM, Johnny Stenback <[email protected]> wrote: > On Wed, Nov 4, 2015 at 7:48 AM, Michael Henretty <[email protected]> > wrote: > > > > On Wed, Nov 4, 2015 at 4:45 PM, Fabrice Desré <[email protected]> > wrote: > >> > >> Can we *right now* identify the worst offenders by looking at the tests > >> results/re-runs? You know that sheriffs will very quickly hide and > >> ignore tests that are really flaky. > > > > > > > > Yes, that's an important point. The problem is that you have to actually > > look at the logs of an individual chunk to see which tests failed. If a > > certain Gij test passes at least 1 out of it's 5 given runs, it will not > > surface to Treeherder, which means we can't start it. Looking through > each > > chunk log file (of which we have 40 per run) is doable, but more time > > consuming and error prone. > > Jumping in on something I haven't been able to pay much attention to > myself here so I may be missing context here, but this sounds like it > sets people up to assume that if something occasionally works we're > good to ship it, as opposed to if it occasionally fails we need to fix > it. Seems to me that this needs to be flipped around very aggressively > for these tests to provide much value. > > - jst > _______________________________________________ > dev-fxos mailing list > [email protected] > https://lists.mozilla.org/listinfo/dev-fxos >
_______________________________________________ dev-fxos mailing list [email protected] https://lists.mozilla.org/listinfo/dev-fxos

