Re: Master test stability poor

Robert Metzger Mon, 23 May 2016 03:01:22 -0700

We could also try to disable the caching of the .m2 directory (I suspect
that it contains broken jar files). The problem is that it this will make
the  builds slower on travis because we need to download more.


On Mon, May 23, 2016 at 10:18 AM, Chesnay Schepler <[email protected]>
wrote:

> If this doesn't work we may want to think about disabling the problematic
> profile temporarily.
>
>
> On 23.05.2016 09:53, Ufuk Celebi wrote:
>
>> Caches have been cleared again (see
>> https://issues.apache.org/jira/browse/INFRA-11773)  The first time did
>> not help. This second request was more an act of desparation. :-(
>> Let's see what happens now.
>>
>> On Wed, Apr 27, 2016 at 3:24 PM, Maximilian Michels <[email protected]>
>> wrote:
>>
>>> +1 for making an effort to tackle test stability problems and
>>> potential involved bugs.
>>>
>>> On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <[email protected]> wrote:
>>>
>>>> @Max: I think you wanted to look into whether we can use Apache's
>>>> Jenkins server for our builds instead of Travis. Did you ever get
>>>> around at looking into it? If yes: What's your opinion on replacing
>>>> Travis with Jenkins? Is it a viable option? Would it improve the
>>>> Travis-specific problems?
>>>>
>>> I've experimented with the ASF Jenkins installation while setting up
>>> our nightly snapshot builds. I've observed that the build servers are
>>> pretty busy. I don't know how busy they are compared to the Travis
>>> servers and whether we could have more stable builds using Jenkins. I
>>> guess we would have to try over a period of time.
>>>
>>> I was hesitant to enable Jenkins for pull requests because I didn't
>>> want to spam the ASF servers with builds. Also, there are some
>>> remaining steps for a good integration like making the Yarn logs
>>> available (not hard to do though).
>>>
>>> What do you think about enabling Jenkins builds for the master and see
>>> how that goes?
>>>
>>> On Wed, Apr 27, 2016 at 2:54 PM, Ufuk Celebi <[email protected]> wrote:
>>>
>>>> Filed an issue with INFRA:
>>>> https://issues.apache.org/jira/browse/INFRA-11773
>>>>
>>>> @Robert: I agree, but still we see failing builds over and over again.
>>>> At best it is annoying, at worst it "hides" new bugs being introduced.
>>>>
>>>> On Wed, Apr 27, 2016 at 2:41 PM, Till Rohrmann <[email protected]>
>>>> wrote:
>>>>
>>>>> That is good to hear that we can so easily solve most of the failing
>>>>> builds. We should then iterate over the open test-stability issues to
>>>>> see
>>>>> whether they are still valid after we've merged PR 1915.
>>>>>
>>>>> On Wed, Apr 27, 2016 at 2:25 PM, Robert Metzger <[email protected]>
>>>>> wrote:
>>>>>
>>>>> I'm not sure if the issues is as big as it seems on a first sight.
>>>>>> The reason why all the builds of master are red on travis is that the
>>>>>> cache
>>>>>> of the 5th build is invalid. We have to ask infra to delete the
>>>>>> caches and
>>>>>> then they'll be green again.
>>>>>>
>>>>>> On Wed, Apr 27, 2016 at 2:13 PM, Ufuk Celebi <[email protected]> wrote:
>>>>>>
>>>>>> Along the lines of what Greg already mentioned, I would like to
>>>>>>> re-iterate that Travis is often a problem too:
>>>>>>> - long build times and we are reaching the time limit
>>>>>>> - unreliable I/O
>>>>>>> - unreliable resolving of build dependencies
>>>>>>>
>>>>>>> @Max: I think you wanted to look into whether we can use Apache's
>>>>>>> Jenkins server for our builds instead of Travis. Did you ever get
>>>>>>> around at looking into it? If yes: What's your opinion on replacing
>>>>>>> Travis with Jenkins? Is it a viable option? Would it improve the
>>>>>>> Travis-specific problems?
>>>>>>>
>>>>>>> On the other hand, the very slow Travis machines also helped
>>>>>>> discovering some hard-to-catch race conditions.
>>>>>>>
>>>>>>> – Ufuk
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 27, 2016 at 2:01 PM, Greg Hogan <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> We have also started running over Travis' 2 hour limit for the
>>>>>>>> longest
>>>>>>>>
>>>>>>> build.
>>>>>>>
>>>>>>>> Greg
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 27, 2016, at 7:53 AM, Ufuk Celebi <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hi Till,
>>>>>>>>>
>>>>>>>>> thank you for bringing this up. We really need to fix this.
>>>>>>>>>
>>>>>>>>> Filing JIRAs with critical priority was how we tried to solve it in
>>>>>>>>> the past, but obviously it did not work. There seems to be a
>>>>>>>>> mismatch
>>>>>>>>> between assigned and actual priorities.
>>>>>>>>>
>>>>>>>>> As a first step, I would volunteer to gather a list of tests, which
>>>>>>>>> have failed in the last weeks and make sure that we have JIRAs for
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> As a next step, we should coordinate how to resolve those issues
>>>>>>>>> (maybe prioritized by failure frequency) to get master stable
>>>>>>>>> again.
>>>>>>>>>
>>>>>>>>> – Ufuk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Apr 27, 2016 at 12:12 PM, Till Rohrmann <
>>>>>>>>>>
>>>>>>>>> [email protected]>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Flink community,
>>>>>>>>>>
>>>>>>>>>> I just wanted to raise awareness that in the last 16 days there
>>>>>>>>>> was
>>>>>>>>>>
>>>>>>>>> just a
>>>>>>>
>>>>>>>> single Travis build of master which passed all tests. This indicates
>>>>>>>>>>
>>>>>>>>> that
>>>>>>>
>>>>>>>> we have some serious problems with our test stability or even worse
>>>>>>>>>> a
>>>>>>>>>> problem with the master itself. Having an unstable master makes it
>>>>>>>>>>
>>>>>>>>> really
>>>>>>>
>>>>>>>> hard to assess whether new changes actually broke something or
>>>>>>>>>>
>>>>>>>>> whether
>>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> failing test was unrelated.
>>>>>>>>>>
>>>>>>>>>> We have currently 37 open issues labeled with test-stability and
>>>>>>>>>> most
>>>>>>>>>>
>>>>>>>>> of
>>>>>>>
>>>>>>>> them have a critical priority. Therefore, I would propose that we
>>>>>>>>>> try
>>>>>>>>>>
>>>>>>>>> to
>>>>>>>
>>>>>>>> tackle them as soon as possible in order to improve our testing
>>>>>>>>>>
>>>>>>>>> stability.
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>>> Till
>>>>>>>>>>
>>>>>>>>>
>

Re: Master test stability poor

Reply via email to