On Tue, Oct 9, 2018 at 6:02 PM Antoine Pitrou <anto...@python.org> wrote:
> > Le 09/10/2018 à 17:54, Wes McKinney a écrit : > > hi folks, > > > > After the packaging automation work for 0.10 was completed, we have > > stalled out a bit on one of the objectives of this framework, which is > > to allow contributors to define and add new tasks that can be run on > > demand or as part of a nightly job. > > > > So we have some problems to solve: > > > > * How to define a task we wish to validate (like building the API > > documentation, or building Arrow with some particular build > > parameters) as a new Crossbow task -- document this well so that > > people have some instructions to follow > Crossbow indeed lacks of documentation in that matter. Defining a task requires a CI configuration and commands per platform and a section in tasks.yml. However I think this is not straightforward enough - like just creating a bash/batch script - We still need to define config management stuff (which makes user friendliness harder to achieve). > > * How to add a task to some kind of a nightly build manifest > > * Where to schedule and run the nightly jobs > Currently nightly builds are submitted by this nightly travis script: https://github.com/kszucs/crossbow/blob/trigger-nightly-builds/.travis.yml We can have arbitrary number of branches to trigger custom jobs, however it requires manual travis setup - with still not satisfying ergonomics. > > * Reporting nightly build failures to the mailing list > I regularly check the nightly builds which occasionally fails, mostly transient failures. For example last conda nightlies have failed, because conda-build have some issues with libarchive - during the feedstock updates I couldn't even rerender them locally. BTW to send the errors to the mailing list We need to set CROSSBOW_EMAIL env variable https://github.com/apache/arrow/blob/master/dev/tasks/crossbow.py#L475 (We might want to use a centralized crossbow repository though with proper permissions). > > > > In terms of scalability requirements, this needs to accommodate 50-100 > tasks. > The current tasks.yml contains a lot of duplication which bothers me, but it provides more flexibility than having another "matrix" definition and implementation. I don't have a user friendly solution for that yet. Parallelization is another question, a single crossbow repo can run ~5 travis jobs and a single appveyor job simultaneously, however We can improve that via introducing more CI services, e.g. pipelines and/or circleci. CI service agnostic? Ideally We should abstract away the CI service (the worker itself), where We do the configuration management right now, see the ".<service>.yml" files: https://github.com/apache/arrow/tree/master/dev/tasks/conda-recipes But then We need to create another, custom (I hope not yml) "dialect" to define build requirements (e.g. node, python, ruby, clang, etc.). It's quite hard to plan an easy and flexible interface for that. > > > > This won't be the last time we need to do some infrastructure work to > > scale our testing process, but this will help with testing things that > > we want to make sure work but without having to increase the size of > > our CI matrix. > > One question which came to my mind is how to develop, debug and maintain > the nightly tasks without waiting for the nightly Travis run for > validation. It doesn't seem easy to trigger a "nightly" build from the > Travis UI. > Good point! Triggering is not the actual issue, but the evaluation of the outcome. We can submit builds if the PR touches e.g. the task definitions, but We cannot really wait for the results, thus triggering builds could be useless. Actually this can be solved by a github integration bot Wes has mentioned, with manual triggering and approval. > > Regards > > Antoine. > All in all I feel the usability crucial here. A couple of examples how a straightforward task definition should look like would be handy. Handling and defining task dependencies is another question too (I'm experimenting with a prototype though). Regards, Krisztian