On Friday, 9 February 2018 03:07:08 PST Olivier Goffart wrote:
> > Which will happen ALL the time. We'll never get back down: when we
> > released
> > Qt 4.2, 4.3 and 4.4, we were happy if only 10 tests failed (that only
> > happened for QWS). For the other platforms, the normal number was a
> > hundred
> > tests failing. Then we spent weeks trying to get the number down.
> > Sorry, I don't want to go back to that.
> That's not exactly how I remember it from the Qt 4.6 and 4.7 times where
> there was no CI yet, but we were actively fixing tests. There were always a
> few red tests on testr.troll.no, but they were mostly flacky tests. What is
> it compared to now with about 100 BLACKLIST files in qtbase alone? (there
> was no blacklist at the time)
Testr is the second generation tool. I am thinking of the previous one,
reported on desert.troll.no. I remember *once* seeing a "0" test failure
reported, again QWS.
We do have BLACKLISTs this time and I complain every time I see one being
added without even an attempt at figuring out what's wrong with the test, or
when the match is overly aggressive ("it fails on Ubuntu in the CI, so it must
fail for everyone using Ubuntu!"). But also remember we have more tests than
10 years ago, probably by a factor of 2. Finally, back then "1" failure
indicated that the entire test executable failed, whether it was 1 test
function, 1 test row, or the program crashed.
> > It would not do us well to look at poorer practices than what we have.
> > Just
> > because everyone else is where we were 10 years ago is no reason for us to
> > go back to it. Show us a *better* model, one that still prevents failures
> > from being added, and we'll consider it.
> How pretentious are you, thinking that everybody else has poorer practice?
> Other projects still manage to release product with passing test. And might
> not get ludicrous release delay "because of the CI system" that was supposed
> to help, but gets in the way instead.
I didn't say everyone does. I even gave an example of a project that has
But I am making a subjective judgement that allowing breakages in and then
fixing them later is a poorer practice. I do not apologise for it.
I am also not saying our method is perfect. Clearly it is not, and that's not
even talking about the implementation and infra.
> Anyway, here is some example of models which works:
> In LLVM, devs commit directly. buildbots build the trunk continuously. In
> case of failure, the buildbot maintainer quickly find out which commit
> likely broke the test and reverts the commit imediatly (or commit a fix if
> the fix is trivial)
Who reverts the commit, the bot or the maintainer?
If it depends on a person, what happens if the person doesn't?
This sounds like our third-generation CI, when we switched to Git, remember
that? Each team had a repository, which the CI built and, if it passed, merged
onto the mainline. If the build was broken, someone in the team had to apply a
fix or a revert.
Do you remember what happened? It took days to get a failure removed.
Sometimes upwards of a week, if the attempt at fix did not actually fix the
> This works because there are buildbot maintainers taking care of the build
> status, and they are not afraid to revert. Also the build and test time is
> reasonable (while still being big), and individual developer can easily run
> all the tests on their machine before submitting patches. (Very few platform
> specific tests).
Therein lies the problem: for any one developer to run all the Qt tests, it
takes a LOT of time. And you usually can't use your machine while it runs the
> On other projects I've seen on github with travis and co.: the tests are
> run for every pull request individually, before it is integrated. The
> reviewer sees the result of the tests and decide whether to merge on not
> based on the result. If one sees that the failures is obviously unrelated
> to the patch, the reviewer will override and merge anyway.
And I use that for TinyCBOR.
But what happens to pull request B when pull request A is merged? Now the test
result is stale. It should be re-done by merging the target branch back into
the PR, or by rebasing (which most projects don't do), to confirm there are no
issues caused by the combination of changes. More often than not, the
maintainer applies a judgement call and accepts the stale results.
TinyCBOR has half a dozen .c source files and three QtTest tests (one of them
is just compilation). It takes about 30 seconds to launch, compile and test
MSVC 2013, 2015 and 2017 on AppVeyor. But it takes about 8 minutes on Travis.
Of those, 7 minutes and 30 seconds are spent updating running apt-get.
> I think this model could be easily applied to Qt.
The part where each change is tested before hand, yes. We've always wanted
The part where changes are merged together and accepted without verifying
whether they cause problems with each other, not so much.
> In particular, there should be an option to merge a patch directly without
> going through CI. (If you are scared of abuse, limit it to admin or
That already exists. It's called "talk to Ossi" and he uses it for changelogs
and the version number bumps.
> This could be used (exceptionally) for urgent patches such as patches that
> fixes the CI and that are not being integrated quickly enough because of
> other unrelated failures, continuing to block the CI for days. That way,
> the CI would not get in the way.
We have the method. We just don't use it.
> In Summary: The CI should be a tool helping the development, and not slowing
> it down. And having a way to override the CI for important patches should
> be an easy and quick way to improve things a bit.
It should not be easy to override the CI. It should be possible, not easy.
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Development mailing list