Re: [DISCUSS] Leveraging cloud computing resources for Arrow test workloads

Brian Hulette Thu, 12 Mar 2020 17:47:12 -0700

* What kind of devops tooling would be appropriate to provision and
manage the instances, scaling up and down based on need?
* What CI/CD platform would be appropriate to dispatch work to the
cloud nodes (taking into consideration the high costs of sysadmin, and
seeking to minimize nodes sitting unused)?


I looked into solutions for running CI/CD workers on GCP a (very) little
bit and just wanted to shared some findings.
Appveyor claims it can auto-scale GCE instances [1] but I don't think it
would go beyond 5 concurrent "self-hosted" jobs [2]. Would that be a
problem?
BuildKite has documentation about running agents on a scalable GKE cluster
[3], but unfortunately no way to auto-scale based on the backlog. We could
maybe roll our own/contribute something based on their AWS scaler [4].

[1] https://www.appveyor.com/docs/byoc/gce/
[2] https://www.appveyor.com/pricing/
[3]
https://buildkite.com/docs/agent/v3/gcloud#running-the-agent-on-google-kubernetes-engine
[4] https://github.com/buildkite/buildkite-agent-scaler

On Wed, Mar 11, 2020 at 7:49 PM Micah Kornfield <[email protected]>
wrote:

> >
> > * Who's going to pay for it? Perhaps Amazon, Google, or Microsoft can
> > donate cloud compute credits to the project
>
> Google has offered a donation of GCP credits based on some estimates I made
> last year when we were facing Travis CI issues. I'm happy to try to do some
> integration work to help make this happen.
>
> For the other questions, I'm happy to do some research, but also happy if
> someone else would like to take up the work here.  I think one blocker in
> the past has been restrictions from Apache Infra, is there any
> documentation on what is and is not supported on that front?
>
> Thanks,
> Micah
> On Wed, Mar 11, 2020 at 3:17 PM Wes McKinney <[email protected]> wrote:
>
> > hi folks,
> >
> > There has periodically been a discussion about employing dedicated
> > compute resources to serve our testing needs beyond what can be
> > accomplished in free / public CI services like GitHub Actions,
> > Appveyor, etc. For example:
> >
> > * Workloads requiring a CUDA-capable GPU
> > * Tests requiring a lot of memory
> > * ARM architecture
> >
> > While physical machines can be hooked up to some CI/CD services like
> > Github Actions and Buildkite, I believe we should not be 100%
> > dependent on the availability of such hardware (the recent tornado in
> > Nashville is a good example of what can go wrong).
> >
> > At some point it will make sense to be able to provision cloud hosts
> > (either temporary spot instances or persistent nodes) to meet these
> > needs. This brings up several questions:
> >
> > * Who's going to pay for it? Perhaps Amazon, Google, or Microsoft can
> > donate cloud compute credits to the project
> > * What kind of devops tooling would be appropriate to provision and
> > manage the instances, scaling up and down based on need?
> > * What CI/CD platform would be appropriate to dispatch work to the
> > cloud nodes (taking into consideration the high costs of sysadmin, and
> > seeking to minimize nodes sitting unused)?
> >
> > This will probably take time to work out and there is significant
> > engineering involved in achieving any solution, but it would be good
> > to have all the options on the table with a frank analysis of the
> > pros/cons and costs (both in money and volunteer time) involved.
> >
> > Thanks,
> > Wes
> >
>

Re: [DISCUSS] Leveraging cloud computing resources for Arrow test workloads

Reply via email to