[
https://issues.apache.org/jira/browse/KUDU-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993254#comment-16993254
]
Adar Dembo commented on KUDU-3007:
----------------------------------
Sorry for not responding earlier; been thinking about the proposal and how we
can best leverage it.
To start, let me provide some context on how our builds and tests work today.
Kudu testing is mostly in pre-commit, with some ad hoc testing performed by
community members prior to a release. Despite the hostname, our
jenkins.kudu.apache.org master and slaves aren't ASF infrastructure; they're
GCP resources donated and managed by Cloudera. It consists of several GCP VMs
running a smattering of Docker containers via Kubernetes. The source code for
all of that infra can be found
[here|https://github.com/cloudera/kudu-upstream-infra]. Builds are performed
inside these containers. C++ and Java tests, however, use the [dist-test
framework|https://github.com/cloudera/dist_test] to execute across a variety of
GCP VMs in parallel. When a build is ready to execute tests, it submits them in
bulk as a job to the dist-test framework hosted by Cloudera. Each job is broken
down into a set of tasks (one per test) which are farmed out to a pool of VMs,
autoscaling that pool as needed to accommodate the load.
So how can we integrate aarch64 resources into all of this? Some thoughts:
* The resources donated to builds.apache.org as part of INFRA-19369 aren't
immediately available to us, since our Jenkins infra is separate from ASF's
infra.
* We can certainly add your ARM VMs as Jenkins slaves to the Cloudera infra,
provided that integrates cleanly with the [Kubernetes-based approach we
use|https://github.com/cloudera/kudu-upstream-infra].
* Reusing dist-test will be challenging because GCP doesn't offer ARM virtual
hardware at all, and some aspects of dist-test are hardcoded for GCP. That
isn't to say it can't be done, but it'd require a non-trivial investment on
your part to understand how dist-test works, modify it so it's suitable for
your ARM VM pool, and host and manage a second dist-test deployment.
* Without dist-test, I wouldn't want ARM-based Kudu tests run in pre-commit as
doing so would significantly increase the development feedback loop.
* So maybe the right approach is a separate Jenkins job in
jenkins.kudu.apache.org that runs periodically, building Kudu and running tests
in the new ARM slaves? The challenge there will be to surface failures loudly
enough that regressions are caught and addressed promptly.
* Hooking our gerrit up to OpenLab CI is intriguing, but does that imply that
the tests are run pre-commit and determine how to gate the change? If so, we'll
have the same increased feedback loop problem I described earlier. If not, test
results may be published back to gerrit well after the changes are merged,
making them easy to ignore.
* Perhaps the path of least resistance is to stand up a completely separate
build pipeline for Kudu in OpenLab CI. The only shared infrastructure would be
build-support/jenkins/build-and-test.sh, the script used to run a build and
some tests. It could run periodically, or it could run post-commit when a
change is merged to master. Tests would run serially and could potentially take
a while to complete. We'd just need to figure out how to surface the results
back to some place where devs will notice.
Let me know what you think. I'm curious whether other Kudu developers more
familiar with our infra and dist-test have any thoughts (cc [~tlipcon]).
> ARM/aarch64 platform support
> ----------------------------
>
> Key: KUDU-3007
> URL: https://issues.apache.org/jira/browse/KUDU-3007
> Project: Kudu
> Issue Type: Improvement
> Reporter: liusheng
> Priority: Critical
>
> As an import alternative of x86 architecture, Aarch64(ARM) architecture is
> currently the dominate architecture in small devices like phone, IOT devices,
> security cameras, drones etc. And also, there are more and more hadware or
> cloud vendor start to provide ARM resources, such as AWS, Huawei, Packet,
> Ampere. etc. Usually, the ARM servers are low cost and more cheap than x86
> servers, and now more and more ARM servers have comparative performance with
> x86 servers, and even more efficient in some areas.
> We want to propose to add an Aarch64 CI for KUDU to promote the support for
> KUDU on Aarch64 platforms. We are willing to provide machines to the current
> CI system and manpower to mananging the CI and fxing problems that occours.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)