Hi Bastian!

> Failures with reason "eunit_replicator"

Replicator tests were fixed very recently and should be fine since *now*.

> Failures with reason "eunit_compression"

Compression failure is something new and seems like we're rarely flaky
in this test. Need magic debug io:format/2 for the rescue.

> Failures with reason "network"

Sometimes this happens and git-wip-us.apache.org is unaccessiable.
Here you can only do one of these actions:
1. Try retry till it will be up
2. Fallback to github mirror

> Failures with reason "docker"

I guess sometmes something wrong happens with a docker service. Worth
to get the logs about, but Clemens (@klaemo) may have some more ideas
about.

> Failures with reason "libdl"

Suddenly, the only build log reference provided is HTTP 404 NOT FOUND

> Failures with reason "aborted"

Actual reason is:
/usr/src/couchdb/apache-couchdb-2.0.0-7e892d6/bin/couchjs: error while
loading shared libraries: libmozjs185.so.1.0: cannot open shared
object file: No such file or directory

So seems like SpiderMokey was not installed correctly.


P.S. Also it would be nice to send a notification about build failure
to dev@ ML to let us be aware about.

P.P.S. Thanks for working on CI! That's really cool and helpful.

--
,,,^..^,,,


On Sat, Feb 27, 2016 at 8:11 PM, Bastian Krol
<[email protected]> wrote:
> Hi folks,
>
> some updates regarding the CouchDB CI setup on builds.apache.org.
>
> The CouchDB build job (https://builds.apache.org/job/CouchDB/) has now six
> variations:
>
> * Ubuntu 14.04 with default Erlang
> * Ubuntu 14.04 with Erlang 18.2
> * Debian 8 with default Erlang
> * Debian 8 with Erlang 18.2
> * CentOS 7 with default Erlang
> * CentOS 7 with Erlang 18.2
>
> However, builds fail abysmal often.
>
> I need your help to sort this out and improve the build stability.
>
> I wrote some quick scripts to categorize the failing builds. Please check
> the result here:
>
> https://github.com/basti1302/couchdb-ci/blob/master/utils/analyze-jenkins-logs/ci-errors.markdown
>
> We can ignore the categories "network", "docker" and "aborted". Most
> failures come from failing enuit tests, either replicator (30 failures) or
> compression (10 failures).
>
> Why is that? Are these tests inherently fragile? Is it a symptom of a
> problem with the CI setup (a bug in Docker or something similar)?
>
> Maybe the categorization/root cause analysis is not even correct?
>
> I'd be grateful if people could chime in here.
>
> I really would like to get closer to "a failing build usually means there is
> a problem in the code"-kind of situation :-)
>
> Best regards
>
>   Bastian

Reply via email to