On Mon, Oct 9, 2017 at 7:38 AM, Sean Busbey <[email protected]> wrote:

> Hi folks!
>
> Lately our precommit runs have had a large amount of noise around unit
> test failures due to timeout, especially for the hbase-server module.
>
>
I've not looked at why the timeouts. Anyone? Usually there is a cause.

...


> I'd really like to get us back to a place where a precommit -1 doesn't
> just result in a reflexive "precommit is unreliable."


This is the default. The exception is one of us works on stabilizing test
suite. It takes a while and a bunch of effort but stabilization has been
doable in the past. Once stable, it stays that way a while before the rot
sets in.



> * Do fewer parallel executions. We do 5 tests at once now and the
> hbase-server module takes ~1.5 hours. We could tune down just the
> hbase-server module to do fewer.
>


Is it the loading that is the issue or tests stamping on each other. If
latter, I'd think we'd want to fix it. If former, would want to look at it
too; I'd think our tests shouldn't be such that they fall over if the
context is other than 'perfect'.

I've not looked at a machine when five concurrent hbase tests running. Is
it even putting up a load? Over the extent of the full test suite? Or is it
that it is just a few tests that when run together, they cause issue. Could
we stagger these or give them their own category or have them burn less
brightly?

If tests are failing because contention for resources, we should fix the
test. If given a machine, we should burn it up rather than pussy-foot it
I'd say (can we size the concurrency off a query of the underlying OS so we
step by CPUs say?).

Tests could do with an edit. Generally, tests are written once and then
never touched again. Meantime the system evolves. Edit could look for
redundancy. Edit could look for cases where we start clusters
--timeconsumming--  and we don't have to (use Mocks or start standalone
instances instead). We also have some crazy tests that spin up lots of
clusters all inside a single JVM though the context is the same as that of
a simple method evaluation.

St.Ack

Reply via email to