My gut feeling is that having a CI that only runs on a specific command
will not help too much.

What about going with nightly builds then? We could set up the ARM CI the
same way as the Travis CI nightly builds (cron builds). They report build
failures to "bui...@flink.apache.org".
Maybe Chesnay or Jark could help with what needs to be done to post to that
mailing list?

A requirement would be that the builds are stable, from the ARM
perspective, meaning that there are no failures at the moment caused by ARM
specific issue.

What do the others think?


On Tue, Sep 3, 2019 at 4:40 AM Xiyuan Wang <wangxiyuan1...@gmail.com> wrote:

> The ARM CI trigger has been changed to `github comment` way only. It means
> that every PR won't start ARM test unless a comment `check_arm` is added.
> Like what I did in the PR[1].
>
> A POC for Flink nightly end to end test job is created as well[2]. I'll
> improve it then.
>
> Any feedback or question?
>
>
> [1]: https://github.com/apache/flink/pull/9416
>      https://github.com/apache/flink/pull/9416#issuecomment-527268203
> [2]: https://github.com/theopenlab/openlab-zuul-jobs/pull/631
>
>
> Thanks
>
> Xiyuan Wang <wangxiyuan1...@gmail.com> 于2019年8月26日周一 下午7:41写道:
>
> > Before ARM CI is ready, I can close the CI test for each PR and let it
> > only be triggered by PR comment.  It's quite easy for OpenLab to do this.
> >
> > OpenLab have many job piplines[1].  Now I use `check` pipline in
> > https://github.com/apache/flink/pull/9416. The job trigger contains
> > github_action and github_comment[2]. I can create a new pipline for
> Flink,
> > the new trigger can only contain github_coment like:
> >
> > trigger:
> >   github:
> >  - event: pull_request
> >    action: comment
> >    comment: (?i)^\s*recheck_arm_build\s*$
> >
> > So that the ARM job will not be ran for every PR. It'll be just ran for
> > the PR which have `recheck_arm_build` comment.
> >
> > Then once ARM CI is ready, I can add it back.
> >
> >
> > nightly tests can be added as well of couse. There is a kind of job in
> > OpenLab called `periodic job`. We can use it for Flink daily nightly
> tests.
> > If any error occur, the report can be sent to bui...@flink.apache.org
> as
> > well.
> >
> > [1]:
> >
> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/pipelines.yaml
> > [2]:
> >
> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/pipelines.yaml#L10-L19
> >
> > Stephan Ewen <se...@apache.org> 于2019年8月26日周一 下午6:13写道:
> >
> >> Adding CI builds for ARM makes only sense when we actually take them
> into
> >> account as "blocking a merge", otherwise there is no point in having
> them.
> >> So we would need to be prepared to do that.
> >>
> >> The cases where something runs in UNIX/x64 but fails on ARM are few
> cases
> >> and so far seem to have been related to libraries or some magic that
> tries
> >> to do system dependent actions outside Java.
> >>
> >> One worthwhile discussion could be whether to run the ARM CI builds as
> >> part
> >> of the nightly tests, not on every commit.
> >> There are a lot of nightly tests, for example for different Java /
> Scala /
> >> Hadoop versions.
> >>
> >> On Mon, Aug 26, 2019 at 10:46 AM Xiyuan Wang <wangxiyuan1...@gmail.com>
> >> wrote:
> >>
> >> > Sorry, maybe my words is misleading.
> >> >
> >> > We are just starting adding ARM support. So the CI is non-voting at
> this
> >> > moment to avoid blocking normal Flink development.
> >> >
> >> > But once the ARM CI works well and stable enough. We should mark it as
> >> > voting. It means that in the future, if the ARM test is failed in a
> PR,
> >> the
> >> > PR can not be merged. The test log may tell develpers what error is
> >> > comming. If the develper need debug the detail on an ARM vm, OpenLab
> can
> >> > provider it.
> >> >
> >> > Adding ARM CI can make sure Flink support ARM originally
> >> >
> >> > I left a workflow in the PR, I'd like to print it here:
> >> >
> >> >    1. Add the basic build script to ensure the CI system and build job
> >> >    works as expect. The job should be marked as non-voting first, it
> >> means the
> >> >    CI test failure won't block Flink PR to be merged.
> >> >    2. Add the test script to run unit/intergration test. At this step
> >> the
> >> >    --fn parameter will be added to mvn test. It will run the full test
> >> cases
> >> >    in Flink, so that we can find what test is failed on ARM.
> >> >    3. Fix the test failure one by one.
> >> >    4. Once all the tests are passed, remove the --fn parameter and
> keep
> >> >    watch the CI's status for some days. If some bugs raise then, fix
> >> them as
> >> >    what we usually do for travis-ci.
> >> >    5. Once the CI is stable enought, remove the non-voting tag, so
> that
> >> >    the ARM CI will be the same as travis-ci, to be one of the gate for
> >> Flink
> >> >    PR.
> >> >    6. Finally, Flink community can announce and release Flink ARM
> >> version.
> >> >
> >> >
> >> > Chesnay Schepler <ches...@apache.org> 于2019年8月26日周一 下午2:25写道:
> >> >
> >> >> I'm sorry, but if these issues are only fixed later anyway I see no
> >> >> reason to run these tests on each PR. We're just adding noise to each
> >> PR
> >> >> that everyone will just ignore.
> >> >>
> >> >> I'm curious as to the benefit of having this directly in Flink; why
> >> >> aren't the ARM builds run outside of the Flink project, and fixes for
> >> it
> >> >> provided?
> >> >>
> >> >> It seems to me like nothing about these arm builds is actually
> handled
> >> >> by the Flink project.
> >> >>
> >> >> On 26/08/2019 03:43, Xiyuan Wang wrote:
> >> >> > Thanks for Stephan to bring up this topic.
> >> >> >
> >> >> > The package build jobs work well now. I have a simple online demo
> >> which
> >> >> is
> >> >> > built and ran on a ARM VM. Feel free to have a try[1].
> >> >> >
> >> >> > As the first step for ARM support, maybe it's good to add them now.
> >> >> >
> >> >> > While for the next step, the test part is still broken. It relates
> to
> >> >> some
> >> >> > points we find:
> >> >> >
> >> >> > 1. Some unit tests are failed[1] by Java coding. These kind of
> >> failure
> >> >> can
> >> >> > be fixed easily.
> >> >> > 2. Some tests are failed by depending on third part libaraies[2].
> It
> >> >> > includes frocksdb, MapR Client and Netty. They don't have ARM
> >> release.
> >> >> >      a. Frocksdb: I'm testing it locally now by `make check_some`
> and
> >> >> `make
> >> >> > jtest` similar with its travis job. There are 3 tests failed by
> `make
> >> >> > check_some`. Please see the ticket for more details. Once the test
> >> pass,
> >> >> > frocksdb can release ARM package then.
> >> >> >      b. MapR Client. This belongs to MapR company. At this moment,
> >> >> maybe we
> >> >> > should skip MapR support for Flink ARM.
> >> >> >      c. Netty. Actually Netty runs well on our ARM machine. We will
> >> ask
> >> >> > Netty community to release ARM support. If they do not want,
> OpenLab
> >> >> will
> >> >> > handle a Maven Repository for some common libraries on ARM.
> >> >> >
> >> >> >
> >> >> > For Chesnay's concern:
> >> >> >
> >> >> > Firstly, OpenLab team will keep maintaining and fixing ARM CI. It
> >> means
> >> >> > that once build or test fails, we'll fix it at once.
> >> >> > Secondly,  OpenLab can provide ARM VMs to everyone for reproducing
> >> and
> >> >> > testing. You just need to creat a  Test Request issue in
> openlab[1].
> >> >> Then
> >> >> > we'll create ARM VMs for you, you can  login and do the thing you
> >> want.
> >> >> >
> >> >> > Does it make sense?
> >> >> >
> >> >> > [1]: http://114.115.168.52:8081/#/overview
> >> >> > [1]: https://issues.apache.org/jira/browse/FLINK-13449
> >> >> >        https://issues.apache.org/jira/browse/FLINK-13450
> >> >> > [2]: https://issues.apache.org/jira/browse/FLINK-13598
> >> >> > [3]: https://github.com/theopenlab/openlab/issues/new/choose
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > Chesnay Schepler <ches...@apache.org> 于2019年8月24日周六 上午12:10写道:
> >> >> >
> >> >> >> I'm wondering what we are supposed to do if the build fails?
> >> >> >> We aren't providing and guides on setting up an arm dev
> >> environment; so
> >> >> >> reproducing it locally isn't possible.
> >> >> >>
> >> >> >> On 23/08/2019 17:55, Stephan Ewen wrote:
> >> >> >>> Hi all!
> >> >> >>>
> >> >> >>> As part of the Flink on ARM effort, there is a pull request that
> >> >> >> triggers a
> >> >> >>> build on OpenLabs CI for each push and runs tests on ARM
> machines.
> >> >> >>>
> >> >> >>> Currently that build is roughly equivalent to what the "core" and
> >> >> "tests"
> >> >> >>> profiles do on Travis.
> >> >> >>> The result will be posted to the PR comments, similar to the
> Flink
> >> >> Bot's
> >> >> >>> Travis build result.
> >> >> >>> The build currently passes :-) so Flink seems to be okay on ARM.
> >> >> >>>
> >> >> >>> My suggestion would be to try and add this and gather some
> >> experience
> >> >> >> with
> >> >> >>> it.
> >> >> >>> The Travis build results should be our "ground truth" and the ARM
> >> CI
> >> >> >>> (openlabs CI) would be "informational only" at the beginning, but
> >> >> helping
> >> >> >>> us understand when we break ARM support.
> >> >> >>>
> >> >> >>> You can see this in the PR that adds the openlabs CI config:
> >> >> >>> https://github.com/apache/flink/pull/9416
> >> >> >>>
> >> >> >>> Any objections?
> >> >> >>>
> >> >> >>> Best,
> >> >> >>> Stephan
> >> >> >>>
> >> >> >>
> >> >>
> >> >>
> >>
> >
>

Reply via email to