Re: Master test stability poor

Chesnay Schepler Mon, 23 May 2016 01:19:15 -0700

If this doesn't work we may want to think about disabling theproblematic profile temporarily.


On 23.05.2016 09:53, Ufuk Celebi wrote:

Caches have been cleared again (see
https://issues.apache.org/jira/browse/INFRA-11773)  The first time did
not help. This second request was more an act of desparation. :-(
Let's see what happens now.


On Wed, Apr 27, 2016 at 3:24 PM, Maximilian Michels <m...@apache.org> wrote:

+1 for making an effort to tackle test stability problems and
potential involved bugs.

On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <u...@apache.org> wrote:

@Max: I think you wanted to look into whether we can use Apache's
Jenkins server for our builds instead of Travis. Did you ever get
around at looking into it? If yes: What's your opinion on replacing
Travis with Jenkins? Is it a viable option? Would it improve the
Travis-specific problems?

I've experimented with the ASF Jenkins installation while setting up
our nightly snapshot builds. I've observed that the build servers are
pretty busy. I don't know how busy they are compared to the Travis
servers and whether we could have more stable builds using Jenkins. I
guess we would have to try over a period of time.

I was hesitant to enable Jenkins for pull requests because I didn't
want to spam the ASF servers with builds. Also, there are some
remaining steps for a good integration like making the Yarn logs
available (not hard to do though).

What do you think about enabling Jenkins builds for the master and see
how that goes?

On Wed, Apr 27, 2016 at 2:54 PM, Ufuk Celebi <u...@apache.org> wrote:

Filed an issue with INFRA: https://issues.apache.org/jira/browse/INFRA-11773

@Robert: I agree, but still we see failing builds over and over again.
At best it is annoying, at worst it "hides" new bugs being introduced.

On Wed, Apr 27, 2016 at 2:41 PM, Till Rohrmann <trohrm...@apache.org> wrote:

That is good to hear that we can so easily solve most of the failing
builds. We should then iterate over the open test-stability issues to see
whether they are still valid after we've merged PR 1915.

On Wed, Apr 27, 2016 at 2:25 PM, Robert Metzger <rmetz...@apache.org> wrote:

I'm not sure if the issues is as big as it seems on a first sight.
The reason why all the builds of master are red on travis is that the cache
of the 5th build is invalid. We have to ask infra to delete the caches and
then they'll be green again.

On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <u...@apache.org> wrote:

Along the lines of what Greg already mentioned, I would like to
re-iterate that Travis is often a problem too:
- long build times and we are reaching the time limit
- unreliable I/O
- unreliable resolving of build dependencies

@Max: I think you wanted to look into whether we can use Apache's
Jenkins server for our builds instead of Travis. Did you ever get
around at looking into it? If yes: What's your opinion on replacing
Travis with Jenkins? Is it a viable option? Would it improve the
Travis-specific problems?

On the other hand, the very slow Travis machines also helped
discovering some hard-to-catch race conditions.

– Ufuk


On Wed, Apr 27, 2016 at 2:01 PM, Greg Hogan <c...@greghogan.com> wrote:

We have also started running over Travis' 2 hour limit for the longest

build.

Greg

On Apr 27, 2016, at 7:53 AM, Ufuk Celebi <u...@apache.org> wrote:

Hi Till,

thank you for bringing this up. We really need to fix this.

Filing JIRAs with critical priority was how we tried to solve it in
the past, but obviously it did not work. There seems to be a mismatch
between assigned and actual priorities.

As a first step, I would volunteer to gather a list of tests, which
have failed in the last weeks and make sure that we have JIRAs for
them.

As a next step, we should coordinate how to resolve those issues
(maybe prioritized by failure frequency) to get master stable again.

– Ufuk

On Wed, Apr 27, 2016 at 12:12 PM, Till Rohrmann <

trohrm...@apache.org>

wrote:

Hi Flink community,

I just wanted to raise awareness that in the last 16 days there was

just a

single Travis build of master which passed all tests. This indicates

that

we have some serious problems with our test stability or even worse a
problem with the master itself. Having an unstable master makes it

really

hard to assess whether new changes actually broke something or

whether

the

failing test was unrelated.

We have currently 37 open issues labeled with test-stability and most

of

them have a critical priority. Therefore, I would propose that we try

to

tackle them as soon as possible in order to improve our testing

stability.

Cheers,
Till

Re: Master test stability poor

Reply via email to