Thanks for the clarification Robert.

Since the first step plan is to replace the travis PR runs, I checked all
PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:

* Travis FAILURE: 298
* Travis SUCCESS: 649 (68.5%)
* Azure FAILURE: 420
* Azure SUCCESS: 571 (57.6%)

Since the patch for each run is equivalent for Travis and Azure, there
seems to be slightly higher failure rate (~10%) when running in Azure.

However, with the just-merged fix for uploading logs (FLINK-16480), I
believe the success rate of Azure could compete with Travis now (uploading
files contribute to 20% of the failures according to the report [2]).

So I'm +1 to disable travis runs according to the numbers.

Best Regards,
Yu

[1]
https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
[2]
https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4

On Thu, 26 Mar 2020 at 03:28, Robert Metzger <rmetz...@apache.org> wrote:

> Thank you for your responses.
>
> @Yu Li: In the current master, the log upload always fails, if the e2e job
> failed. I just merged a PR that fixes this issue [1]. The problem was not
> really the network stability, rather a problem with the interaction of the
> jobs in the pipeline (the e2e job did not set the right variables for the
> log upload)
> Secondly, you are looking at the report of the "flink-ci.flink" pipeline,
> where pull requests are build. Naturally, pull request builds fail all the
> time, because the PRs are not yet perfect.
>
> "flink-ci.flink-master" is the right pipeline to look at:
>
> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
> We have a fairly high number of failures there, because we currently have
> some issues downloading the maven artifacts [2]. I'm working already with
> Chesnay on merging a fix for that.
>
>
> [1]
>
> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
> [2]https://issues.apache.org/jira/browse/FLINK-16720
>
>
>
> On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
> > The easiest way to disable travis for pushes is to remove all builds
> > from the .travis.yml with a push/pr condition.
> >
> > On 25/03/2020 15:03, Robert Metzger wrote:
> > > Thank you for the feedback so far.
> > >
> > > Responses to the items Chesnay raised:
> > >
> > > - by virtue of maintaining the past 2 releases we will have to maintain
> > any
> > >> Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
> > >>
> > > Okay. I wasn't sure about the exact policy there.
> > >
> > >
> > >> - the azure setup doesn't appear to be equivalent yet since the java
> e2e
> > >> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
> > >> which SQLClientKafkaITCase isn't run
> > >>
> > > I filed a ticket to address this:
> > > https://issues.apache.org/jira/browse/FLINK-16778
> > >
> > >
> > >> - the nightly scripts still seems to be using a maven version other
> than
> > >> 3.2.5; from today on master:
> > >> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > >> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
> > >> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[
> > jar
> > >> ]---------------------------------
> > >> 2020-03-25T05:31:52.7518360Z [INFO]
> > >> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > maven-checkstyle-plugin:2.17:check
> > >> (validate) @ flink-end-to-end-tests-common-kafka ---
> > >>
> > > I'm planning to address this as part of
> > > https://issues.apache.org/jira/browse/FLINK-16411, where I work on
> > > centralizing all mvn invocations.
> > >
> > >
> > >> - there is no real benefit in retiring the travis support in CiBot;
> the
> > >> important part is whether Travis is run or not for pull requests.
> > >>  From what I can tell though azure seems to be working fine for pull
> > >> requests, so +1 to at least disable the travis PR runs.
> > >
> > > So we disable Travis for https://github.com/flink-ci/flink ? I will do
> > it
> > > once there are no new concerns and above tickets are resolved.
> > >
> > > What about disabling travis for master pushes? (e.g. removing the
> > > .travis.yml file from master)?
> > >
> > >
> > > @Dian:
> > > Thanks a lot for your feedback.
> > >
> > > - The report of Azure is still not viewable[1] (I noticed that Hequn
> has
> > >> also reported this issue in another thread). This is very useful
> > >> information.
> > >
> > > You are referring to the emails send to builds@f.a.o right?
> > > I have reported this both as a bug [1] and a feature request [2] to
> > Azure.
> > > But I don't believe they will resolve this issue anytime soon.
> > > Azure has an notifications API that we could use to build a service
> that
> > > sends emails to that list, but I feel that this is really a waste of
> > time.
> > > The URL in the link even contains the ID of the build. We would just
> need
> > > to extract this ID and generate the appropriate URL. I will try to
> > directly
> > > reach the product management of AZP, maybe I can get some attention
> from
> > > there.
> > >
> > >
> > >
> > > [1]
> > >
> >
> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
> > > [2]
> > >
> >
> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
> > >
> > >
> > >
> > > On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <ches...@apache.org>
> > > wrote:
> > >
> > >> It was left out since it adds significant additional complexity and
> the
> > >> value is dubious at best for PRs that aren't merged shortly after the
> > >> build has finished.
> > >>
> > >> On 25/03/2020 10:28, Dian Fu wrote:
> > >>> Thanks for the information. I'm sorry that I'm not aware of this
> before
> > >> and I have checked the build log of travis and confirmed that this is
> > true.
> > >>> @Chesnay Are there any specific reasons for this and is it possible
> to
> > >> add this back for Azure Pipelines?
> > >>> Thanks,
> > >>> Dian
> > >>>
> > >>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <ches...@apache.org> 写道:
> > >>>>
> > >>>> @Dian we haven't been rebasing PR's against master for months, ever
> > >> since we switched to CiBot.
> > >>>> On 25/03/2020 09:29, Dian Fu wrote:
> > >>>>> Hi Robert,
> > >>>>>
> > >>>>> Thanks a lot for your great work!
> > >>>>>
> > >>>>> Overall I'm +1 to switch to Azure as the primary CI tool if it's
> > >> stable enough as I think there is no need to run both the travis and
> > Azure
> > >> for one single PR.
> > >>>>> However, there are still some improvements need to do and it would
> be
> > >> great if these issues could be addressed before fully switch to Azure:
> > >>>>> - The report of Azure is still not viewable[1] (I noticed that
> Hequn
> > >> has also reported this issue in another thread). This is very useful
> > >> information.
> > >>>>> - For PR test of Azure pipeline, it seems that it will not rebase
> the
> > >> master code before running the tests.
> > >>>>> Thanks,
> > >>>>> Dian
> > >>>>>
> > >>>>> [1]
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <ches...@apache.org> 写道:
> > >>>>>>
> > >>>>>> Some thoughts:
> > >>>>>> - by virtue of maintaining the past 2 releases we will have to
> > >> maintain any Travis infrastructure as long as 1.10 is supported, i.e.,
> > >> until 1.12
> > >>>>>> - the azure setup doesn't appear to be equivalent yet since the
> java
> > >> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
> result
> > of
> > >> which SQLClientKafkaITCase isn't run
> > >>>>>> - the nightly scripts still seems to be using a maven version
> other
> > >> than 3.2.5; from today on master:
> > >>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > >>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> > >>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
> > --------------------------------[
> > >> jar ]---------------------------------
> > >>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
> > >>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > >> maven-checkstyle-plugin:2.17:check (validate) @
> > >> flink-end-to-end-tests-common-kafka ---
> > >>>>>> - there is no real benefit in retiring the travis support in
> CiBot;
> > >> the important part is whether Travis is run or not for pull requests.
> > >>>>>>   From what I can tell though azure seems to be working fine for
> > pull
> > >> requests, so +1 to at least disable the travis PR runs.
> > >>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
> > >>>>>>> Hey devs,
> > >>>>>>>
> > >>>>>>> I would like to discuss whether it makes sense to fully switch to
> > >> Azure
> > >>>>>>> Pipelines and phase out our Travis integration.
> > >>>>>>> More information on our Azure integration can be found here:
> > >>>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> > >>>>>>> Travis will stay for the release-1.10 and older branches, as I
> have
> > >> set up
> > >>>>>>> Azure only for the master branch.
> > >>>>>>>
> > >>>>>>> Proposal:
> > >>>>>>> - We keep the flinkbot infrastructure supporting both Travis and
> > >> Azure
> > >>>>>>> around, while we are still receive pull requests and pushes for
> the
> > >>>>>>> "master" and "release-1.10" branches.
> > >>>>>>> - We remove the travis-specific files from "master", so that
> builds
> > >> are not
> > >>>>>>> triggered anymore
> > >>>>>>> - once we receive no more builds at Travis (because 1.11 has been
> > >>>>>>> released), we remove the remaining travis-related infrastructure
> > >>>>>>>
> > >>>>>>> What do you think?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Robert
> > >>
> > >>
> >
> >
>

Reply via email to