To provide slightly more context here, we do have some ability to change what's on the build machines (they're puppetized and the configurations are available at https://github.com/apache/infrastructure-puppet), but the image that they're running is Ubuntu 14.04, and some latest versions of things are just not available. The machines are owned by Google and managed with the much-appreciated help of the Apache Infrastructure team.
Making a new image for the slaves has come up a couple of times, and a sufficiently motivated person could have some impact there by taking on that work, but it seems to me that given Jenkins has first-class support for running builds inside of Docker containers, that's probably what we want for long-term sustainability. I've chatted with Infra a bit about this issue; they said that they'll have 18.04 support ready in a couple of weeks, so in that time frame we could start looking at upgrading the image and getting Dockerized builds going. Yifan, are you planning on owning this area? On Mon, Apr 16, 2018 at 4:02 PM Yifan Zou <yifan...@google.com> wrote: > Those machines are managed by apache-infra, that we are not able to > install/update tools on them. We plan to have a new instance group for beam > Jenkins since we are required to update OS to the latest Ubuntu. With > fresh, supportive dependencies installed on new machines could also get rid > of restrictions on python tests. But for now, I can hardly tell when we > could have new Jenkins VMs since the latest OS image is not available yet. > > Yifan Zou > > > On Mon, Apr 16, 2018 at 3:10 PM Robert Bradshaw <rober...@google.com> > wrote: > >> Thanks. >> >> In the short term, I could try to limit precommits to these two machines >> following that example, but presumably that would mean longer queues. Who >> owns these machines? Could we just wipe them and install fresh, modern, >> consistent OS/environments on them? (The container story seems like a great >> long-term solution, especially for local reproducibility, but probably not >> as easy...) >> >> >> On Mon, Apr 16, 2018 at 2:33 PM Yifan Zou <yifan...@google.com> wrote: >> >>> The Jenkins worker configurations is a pain point of beam build and >>> tests, and it is indeed difficult to debug. Originally, python tests such >>> as beam_PostCommit_Python_Verify only run on one worker due to BEAM-1817 >>> <https://issues.apache.org/jira/projects/BEAM/issues/BEAM-3395?filter=allissues>. >>> We probably need to do the same thing for >>> beam_PreCommit_Python_GradleBuild in short term. >>> In order to solve this problem, we did research and experiments on >>> running Jenkins tests within a container and organized a short >>> documentation. It is being reviewed within Engprod team and will be shared >>> for wider review shortly. >>> >>> Yifan Zou >>> >>> On Mon, Apr 16, 2018 at 1:10 PM Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> I've been trying to debug why beam_PreCommit_Python_GradleBuild seems >>>> to be >>>> failing so often, and it looks like the beam-sdks-python:setupVirtualenv >>>> task succeeds on beam2 and beam6, but always fails on beam1, beam3, >>>> beam4, >>>> and beam8. (I didn't see any runs on beam5 or beam7, I vaguely seem to >>>> remember beam5 being blacklisted...) I can't reproduce the failure >>>> locally >>>> and the remote logs (e.g. >>>> >>>> https://builds.apache.org/job/beam_PreCommit_Python_GradleBuild/471/console >>>> ) don't seem to be very enlightening either. This leads to a couple of >>>> questions: >>>> >>>> * How are our jenkins beam workers configured, and why aren't they the >>>> same? >>>> * How does one go about debugging failures like this? >>>> >>>> Before too much effort is invested, how far are we from using >>>> containers to >>>> manage the build environments? >>>> >>> -- ------- Jason Kuster Apache Beam / Google Cloud Dataflow See something? Say something. go/jasonkuster-feedback