Sure,  we can run daily ARM job as Travis CI nightly jobs firstly. Once
it's stable enough, we can consider adding it to peer PR.

BTW, I tested flink-end-to-end-test on ARM in last few days. Keeping the
same as Travis, all 7 scenarios were tested:

1. split_checkpoints.sh
2. split_sticky.sh
3. split_ha.sh
4. split_heavy.sh
5. split_misc_hadoopfree.sh
6. split_misc.sh
7. split_container.sh

The 1st-6th scenarios works well within some hacking and bug fixing locally:
    1. frocksdb doesn't have official ARM release, so I built and install
it locally for ARM.
          https://issues.apache.org/jira/browse/FLINK-13598
    2. Prometheus has ARM release but the test always download x86 version.
Download the correct version can fix the issue.
          https://issues.apache.org/jira/browse/FLINK-14086
    3. Elasticsearch 6.0+ enables Xpack machine learning feature by
default, but this feature doesn't support ARM. So Elasticsearch 6.0+ failed
to start on ARM. Set `Xpack.ml.enabled: false` can fix this issue.
          https://issues.apache.org/jira/browse/FLINK-14126

The 7th scenario for container failed because:
    1. docker-compose doesn't have official ARM package. Use `apt install
docker-compose` can solve the problem.
    2. minikube doesn't support ARM arch. Use kubeadm for K8S installation
can solve the problem.

Fixing the problem mentioned above is not hard. So I think we can add flink
build, unit-test and e2e test as nightly jobs now.

Any idea?

Thanks.

Stephan Ewen <se...@apache.org> 于2019年9月19日周四 下午5:44写道:

> My gut feeling is that having a CI that only runs on a specific command
> will not help too much.
>
> What about going with nightly builds then? We could set up the ARM CI the
> same way as the Travis CI nightly builds (cron builds). They report build
> failures to "bui...@flink.apache.org".
> Maybe Chesnay or Jark could help with what needs to be done to post to that
> mailing list?
>
> A requirement would be that the builds are stable, from the ARM
> perspective, meaning that there are no failures at the moment caused by ARM
> specific issue.
>
> What do the others think?
>
>
> On Tue, Sep 3, 2019 at 4:40 AM Xiyuan Wang <wangxiyuan1...@gmail.com>
> wrote:
>
> > The ARM CI trigger has been changed to `github comment` way only. It
> means
> > that every PR won't start ARM test unless a comment `check_arm` is added.
> > Like what I did in the PR[1].
> >
> > A POC for Flink nightly end to end test job is created as well[2]. I'll
> > improve it then.
> >
> > Any feedback or question?
> >
> >
> > [1]: https://github.com/apache/flink/pull/9416
> >      https://github.com/apache/flink/pull/9416#issuecomment-527268203
> > [2]: https://github.com/theopenlab/openlab-zuul-jobs/pull/631
> >
> >
> > Thanks
> >
> > Xiyuan Wang <wangxiyuan1...@gmail.com> 于2019年8月26日周一 下午7:41写道:
> >
> > > Before ARM CI is ready, I can close the CI test for each PR and let it
> > > only be triggered by PR comment.  It's quite easy for OpenLab to do
> this.
> > >
> > > OpenLab have many job piplines[1].  Now I use `check` pipline in
> > > https://github.com/apache/flink/pull/9416. The job trigger contains
> > > github_action and github_comment[2]. I can create a new pipline for
> > Flink,
> > > the new trigger can only contain github_coment like:
> > >
> > > trigger:
> > >   github:
> > >  - event: pull_request
> > >    action: comment
> > >    comment: (?i)^\s*recheck_arm_build\s*$
> > >
> > > So that the ARM job will not be ran for every PR. It'll be just ran for
> > > the PR which have `recheck_arm_build` comment.
> > >
> > > Then once ARM CI is ready, I can add it back.
> > >
> > >
> > > nightly tests can be added as well of couse. There is a kind of job in
> > > OpenLab called `periodic job`. We can use it for Flink daily nightly
> > tests.
> > > If any error occur, the report can be sent to bui...@flink.apache.org
> > as
> > > well.
> > >
> > > [1]:
> > >
> >
> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/pipelines.yaml
> > > [2]:
> > >
> >
> https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/pipelines.yaml#L10-L19
> > >
> > > Stephan Ewen <se...@apache.org> 于2019年8月26日周一 下午6:13写道:
> > >
> > >> Adding CI builds for ARM makes only sense when we actually take them
> > into
> > >> account as "blocking a merge", otherwise there is no point in having
> > them.
> > >> So we would need to be prepared to do that.
> > >>
> > >> The cases where something runs in UNIX/x64 but fails on ARM are few
> > cases
> > >> and so far seem to have been related to libraries or some magic that
> > tries
> > >> to do system dependent actions outside Java.
> > >>
> > >> One worthwhile discussion could be whether to run the ARM CI builds as
> > >> part
> > >> of the nightly tests, not on every commit.
> > >> There are a lot of nightly tests, for example for different Java /
> > Scala /
> > >> Hadoop versions.
> > >>
> > >> On Mon, Aug 26, 2019 at 10:46 AM Xiyuan Wang <
> wangxiyuan1...@gmail.com>
> > >> wrote:
> > >>
> > >> > Sorry, maybe my words is misleading.
> > >> >
> > >> > We are just starting adding ARM support. So the CI is non-voting at
> > this
> > >> > moment to avoid blocking normal Flink development.
> > >> >
> > >> > But once the ARM CI works well and stable enough. We should mark it
> as
> > >> > voting. It means that in the future, if the ARM test is failed in a
> > PR,
> > >> the
> > >> > PR can not be merged. The test log may tell develpers what error is
> > >> > comming. If the develper need debug the detail on an ARM vm, OpenLab
> > can
> > >> > provider it.
> > >> >
> > >> > Adding ARM CI can make sure Flink support ARM originally
> > >> >
> > >> > I left a workflow in the PR, I'd like to print it here:
> > >> >
> > >> >    1. Add the basic build script to ensure the CI system and build
> job
> > >> >    works as expect. The job should be marked as non-voting first, it
> > >> means the
> > >> >    CI test failure won't block Flink PR to be merged.
> > >> >    2. Add the test script to run unit/intergration test. At this
> step
> > >> the
> > >> >    --fn parameter will be added to mvn test. It will run the full
> test
> > >> cases
> > >> >    in Flink, so that we can find what test is failed on ARM.
> > >> >    3. Fix the test failure one by one.
> > >> >    4. Once all the tests are passed, remove the --fn parameter and
> > keep
> > >> >    watch the CI's status for some days. If some bugs raise then, fix
> > >> them as
> > >> >    what we usually do for travis-ci.
> > >> >    5. Once the CI is stable enought, remove the non-voting tag, so
> > that
> > >> >    the ARM CI will be the same as travis-ci, to be one of the gate
> for
> > >> Flink
> > >> >    PR.
> > >> >    6. Finally, Flink community can announce and release Flink ARM
> > >> version.
> > >> >
> > >> >
> > >> > Chesnay Schepler <ches...@apache.org> 于2019年8月26日周一 下午2:25写道:
> > >> >
> > >> >> I'm sorry, but if these issues are only fixed later anyway I see no
> > >> >> reason to run these tests on each PR. We're just adding noise to
> each
> > >> PR
> > >> >> that everyone will just ignore.
> > >> >>
> > >> >> I'm curious as to the benefit of having this directly in Flink; why
> > >> >> aren't the ARM builds run outside of the Flink project, and fixes
> for
> > >> it
> > >> >> provided?
> > >> >>
> > >> >> It seems to me like nothing about these arm builds is actually
> > handled
> > >> >> by the Flink project.
> > >> >>
> > >> >> On 26/08/2019 03:43, Xiyuan Wang wrote:
> > >> >> > Thanks for Stephan to bring up this topic.
> > >> >> >
> > >> >> > The package build jobs work well now. I have a simple online demo
> > >> which
> > >> >> is
> > >> >> > built and ran on a ARM VM. Feel free to have a try[1].
> > >> >> >
> > >> >> > As the first step for ARM support, maybe it's good to add them
> now.
> > >> >> >
> > >> >> > While for the next step, the test part is still broken. It
> relates
> > to
> > >> >> some
> > >> >> > points we find:
> > >> >> >
> > >> >> > 1. Some unit tests are failed[1] by Java coding. These kind of
> > >> failure
> > >> >> can
> > >> >> > be fixed easily.
> > >> >> > 2. Some tests are failed by depending on third part libaraies[2].
> > It
> > >> >> > includes frocksdb, MapR Client and Netty. They don't have ARM
> > >> release.
> > >> >> >      a. Frocksdb: I'm testing it locally now by `make check_some`
> > and
> > >> >> `make
> > >> >> > jtest` similar with its travis job. There are 3 tests failed by
> > `make
> > >> >> > check_some`. Please see the ticket for more details. Once the
> test
> > >> pass,
> > >> >> > frocksdb can release ARM package then.
> > >> >> >      b. MapR Client. This belongs to MapR company. At this
> moment,
> > >> >> maybe we
> > >> >> > should skip MapR support for Flink ARM.
> > >> >> >      c. Netty. Actually Netty runs well on our ARM machine. We
> will
> > >> ask
> > >> >> > Netty community to release ARM support. If they do not want,
> > OpenLab
> > >> >> will
> > >> >> > handle a Maven Repository for some common libraries on ARM.
> > >> >> >
> > >> >> >
> > >> >> > For Chesnay's concern:
> > >> >> >
> > >> >> > Firstly, OpenLab team will keep maintaining and fixing ARM CI. It
> > >> means
> > >> >> > that once build or test fails, we'll fix it at once.
> > >> >> > Secondly,  OpenLab can provide ARM VMs to everyone for
> reproducing
> > >> and
> > >> >> > testing. You just need to creat a  Test Request issue in
> > openlab[1].
> > >> >> Then
> > >> >> > we'll create ARM VMs for you, you can  login and do the thing you
> > >> want.
> > >> >> >
> > >> >> > Does it make sense?
> > >> >> >
> > >> >> > [1]: http://114.115.168.52:8081/#/overview
> > >> >> > [1]: https://issues.apache.org/jira/browse/FLINK-13449
> > >> >> >        https://issues.apache.org/jira/browse/FLINK-13450
> > >> >> > [2]: https://issues.apache.org/jira/browse/FLINK-13598
> > >> >> > [3]: https://github.com/theopenlab/openlab/issues/new/choose
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > Chesnay Schepler <ches...@apache.org> 于2019年8月24日周六 上午12:10写道:
> > >> >> >
> > >> >> >> I'm wondering what we are supposed to do if the build fails?
> > >> >> >> We aren't providing and guides on setting up an arm dev
> > >> environment; so
> > >> >> >> reproducing it locally isn't possible.
> > >> >> >>
> > >> >> >> On 23/08/2019 17:55, Stephan Ewen wrote:
> > >> >> >>> Hi all!
> > >> >> >>>
> > >> >> >>> As part of the Flink on ARM effort, there is a pull request
> that
> > >> >> >> triggers a
> > >> >> >>> build on OpenLabs CI for each push and runs tests on ARM
> > machines.
> > >> >> >>>
> > >> >> >>> Currently that build is roughly equivalent to what the "core"
> and
> > >> >> "tests"
> > >> >> >>> profiles do on Travis.
> > >> >> >>> The result will be posted to the PR comments, similar to the
> > Flink
> > >> >> Bot's
> > >> >> >>> Travis build result.
> > >> >> >>> The build currently passes :-) so Flink seems to be okay on
> ARM.
> > >> >> >>>
> > >> >> >>> My suggestion would be to try and add this and gather some
> > >> experience
> > >> >> >> with
> > >> >> >>> it.
> > >> >> >>> The Travis build results should be our "ground truth" and the
> ARM
> > >> CI
> > >> >> >>> (openlabs CI) would be "informational only" at the beginning,
> but
> > >> >> helping
> > >> >> >>> us understand when we break ARM support.
> > >> >> >>>
> > >> >> >>> You can see this in the PR that adds the openlabs CI config:
> > >> >> >>> https://github.com/apache/flink/pull/9416
> > >> >> >>>
> > >> >> >>> Any objections?
> > >> >> >>>
> > >> >> >>> Best,
> > >> >> >>> Stephan
> > >> >> >>>
> > >> >> >>
> > >> >>
> > >> >>
> > >>
> > >
> >
>

Reply via email to