Re: Master test stability poor

Ufuk Celebi Mon, 23 May 2016 00:54:48 -0700

Caches have been cleared again (see
https://issues.apache.org/jira/browse/INFRA-11773)  The first time did
not help. This second request was more an act of desparation. :-(
Let's see what happens now.


On Wed, Apr 27, 2016 at 3:24 PM, Maximilian Michels <[email protected]> wrote:
> +1 for making an effort to tackle test stability problems and
> potential involved bugs.
>
> On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <[email protected]> wrote:
>> @Max: I think you wanted to look into whether we can use Apache's
>> Jenkins server for our builds instead of Travis. Did you ever get
>> around at looking into it? If yes: What's your opinion on replacing
>> Travis with Jenkins? Is it a viable option? Would it improve the
>> Travis-specific problems?
>
> I've experimented with the ASF Jenkins installation while setting up
> our nightly snapshot builds. I've observed that the build servers are
> pretty busy. I don't know how busy they are compared to the Travis
> servers and whether we could have more stable builds using Jenkins. I
> guess we would have to try over a period of time.
>
> I was hesitant to enable Jenkins for pull requests because I didn't
> want to spam the ASF servers with builds. Also, there are some
> remaining steps for a good integration like making the Yarn logs
> available (not hard to do though).
>
> What do you think about enabling Jenkins builds for the master and see
> how that goes?
>
> On Wed, Apr 27, 2016 at 2:54 PM, Ufuk Celebi <[email protected]> wrote:
>> Filed an issue with INFRA: https://issues.apache.org/jira/browse/INFRA-11773
>>
>> @Robert: I agree, but still we see failing builds over and over again.
>> At best it is annoying, at worst it "hides" new bugs being introduced.
>>
>> On Wed, Apr 27, 2016 at 2:41 PM, Till Rohrmann <[email protected]> wrote:
>>> That is good to hear that we can so easily solve most of the failing
>>> builds. We should then iterate over the open test-stability issues to see
>>> whether they are still valid after we've merged PR 1915.
>>>
>>> On Wed, Apr 27, 2016 at 2:25 PM, Robert Metzger <[email protected]> wrote:
>>>
>>>> I'm not sure if the issues is as big as it seems on a first sight.
>>>> The reason why all the builds of master are red on travis is that the cache
>>>> of the 5th build is invalid. We have to ask infra to delete the caches and
>>>> then they'll be green again.
>>>>
>>>> On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <[email protected]> wrote:
>>>>
>>>> > Along the lines of what Greg already mentioned, I would like to
>>>> > re-iterate that Travis is often a problem too:
>>>> > - long build times and we are reaching the time limit
>>>> > - unreliable I/O
>>>> > - unreliable resolving of build dependencies
>>>> >
>>>> > @Max: I think you wanted to look into whether we can use Apache's
>>>> > Jenkins server for our builds instead of Travis. Did you ever get
>>>> > around at looking into it? If yes: What's your opinion on replacing
>>>> > Travis with Jenkins? Is it a viable option? Would it improve the
>>>> > Travis-specific problems?
>>>> >
>>>> > On the other hand, the very slow Travis machines also helped
>>>> > discovering some hard-to-catch race conditions.
>>>> >
>>>> > – Ufuk
>>>> >
>>>> >
>>>> > On Wed, Apr 27, 2016 at 2:01 PM, Greg Hogan <[email protected]> wrote:
>>>> > > We have also started running over Travis' 2 hour limit for the longest
>>>> > build.
>>>> > >
>>>> > > Greg
>>>> > >
>>>> > >
>>>> > >> On Apr 27, 2016, at 7:53 AM, Ufuk Celebi <[email protected]> wrote:
>>>> > >>
>>>> > >> Hi Till,
>>>> > >>
>>>> > >> thank you for bringing this up. We really need to fix this.
>>>> > >>
>>>> > >> Filing JIRAs with critical priority was how we tried to solve it in
>>>> > >> the past, but obviously it did not work. There seems to be a mismatch
>>>> > >> between assigned and actual priorities.
>>>> > >>
>>>> > >> As a first step, I would volunteer to gather a list of tests, which
>>>> > >> have failed in the last weeks and make sure that we have JIRAs for
>>>> > >> them.
>>>> > >>
>>>> > >> As a next step, we should coordinate how to resolve those issues
>>>> > >> (maybe prioritized by failure frequency) to get master stable again.
>>>> > >>
>>>> > >> – Ufuk
>>>> > >>
>>>> > >>
>>>> > >>> On Wed, Apr 27, 2016 at 12:12 PM, Till Rohrmann <
>>>> [email protected]>
>>>> > wrote:
>>>> > >>> Hi Flink community,
>>>> > >>>
>>>> > >>> I just wanted to raise awareness that in the last 16 days there was
>>>> > just a
>>>> > >>> single Travis build of master which passed all tests. This indicates
>>>> > that
>>>> > >>> we have some serious problems with our test stability or even worse a
>>>> > >>> problem with the master itself. Having an unstable master makes it
>>>> > really
>>>> > >>> hard to assess whether new changes actually broke something or
>>>> whether
>>>> > the
>>>> > >>> failing test was unrelated.
>>>> > >>>
>>>> > >>> We have currently 37 open issues labeled with test-stability and most
>>>> > of
>>>> > >>> them have a critical priority. Therefore, I would propose that we try
>>>> > to
>>>> > >>> tackle them as soon as possible in order to improve our testing
>>>> > stability.
>>>> > >>>
>>>> > >>> Cheers,
>>>> > >>> Till
>>>> >
>>>>

Re: Master test stability poor

Reply via email to