Those machines are managed by apache-infra, that we are not able to install/update tools on them. We plan to have a new instance group for beam Jenkins since we are required to update OS to the latest Ubuntu. With fresh, supportive dependencies installed on new machines could also get rid of restrictions on python tests. But for now, I can hardly tell when we could have new Jenkins VMs since the latest OS image is not available yet.
Yifan Zou On Mon, Apr 16, 2018 at 3:10 PM Robert Bradshaw <rober...@google.com> wrote: > Thanks. > > In the short term, I could try to limit precommits to these two machines > following that example, but presumably that would mean longer queues. Who > owns these machines? Could we just wipe them and install fresh, modern, > consistent OS/environments on them? (The container story seems like a great > long-term solution, especially for local reproducibility, but probably not > as easy...) > > > On Mon, Apr 16, 2018 at 2:33 PM Yifan Zou <yifan...@google.com> wrote: > >> The Jenkins worker configurations is a pain point of beam build and >> tests, and it is indeed difficult to debug. Originally, python tests such >> as beam_PostCommit_Python_Verify only run on one worker due to BEAM-1817 >> <https://issues.apache.org/jira/projects/BEAM/issues/BEAM-3395?filter=allissues>. >> We probably need to do the same thing for >> beam_PreCommit_Python_GradleBuild in short term. >> In order to solve this problem, we did research and experiments on >> running Jenkins tests within a container and organized a short >> documentation. It is being reviewed within Engprod team and will be shared >> for wider review shortly. >> >> Yifan Zou >> >> On Mon, Apr 16, 2018 at 1:10 PM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> I've been trying to debug why beam_PreCommit_Python_GradleBuild seems to >>> be >>> failing so often, and it looks like the beam-sdks-python:setupVirtualenv >>> task succeeds on beam2 and beam6, but always fails on beam1, beam3, >>> beam4, >>> and beam8. (I didn't see any runs on beam5 or beam7, I vaguely seem to >>> remember beam5 being blacklisted...) I can't reproduce the failure >>> locally >>> and the remote logs (e.g. >>> >>> https://builds.apache.org/job/beam_PreCommit_Python_GradleBuild/471/console >>> ) don't seem to be very enlightening either. This leads to a couple of >>> questions: >>> >>> * How are our jenkins beam workers configured, and why aren't they the >>> same? >>> * How does one go about debugging failures like this? >>> >>> Before too much effort is invested, how far are we from using containers >>> to >>> manage the build environments? >>> >>