FYI it looks like all the Go tests are now failing because it can't find the Go command at all. Did a Jenkins image without Go (v1.16+) pre-installed get pushed?
On Mon, Oct 18, 2021, 1:45 PM Valentyn Tymofieiev <[email protected]> wrote: > Thanks Daniel, > > I can recreate the VMs on new disks. > > We currently have a set of stopped jenkins workers (named: > apache-beam-jenkins-##) and running workers (named: > apache-ci-beam-jenkins-##) > > Are there any concerns about deleting the stopped group of workers? > > > > On Mon, Oct 18, 2021 at 11:19 AM Ahmet Altay <[email protected]> wrote: > >> Thank you Daniel, Valentyn! >> >> On Mon, Oct 18, 2021 at 8:02 AM Daniel Oliveira <[email protected]> >> wrote: >> >>> I performed a light update of both Go and Python (from Valentyn's >>> update) on each worker VM over the weekend. I also added additional >>> instructions for the light update to Confluence (as an alternative to the >>> current instructions). >>> >>> There is still reason to perform a full update at some point: Valentyn >>> updated the VM image from 500 GB to 1000 GB of storage, which requires a >>> full update to actually take effect. >>> >>> On Tue, Oct 12, 2021 at 10:32 AM Valentyn Tymofieiev < >>> [email protected]> wrote: >>> >>>> > 3. SSH into the agent and perform the update. >>>> So, this would be a 'lite' version of the update, where we make changes >>>> to the live worker without recreating worker VM with a new image? We could >>>> perhaps document both options, and also make it clear that producing a VM >>>> image that has necessary updates is mandatory even if we perform 'lite' >>>> updates without recreating the worker. >>>> Also, for a lite update, marking the Jenkins offer offline may be >>>> optional, as some updates might not be disruptive (such as installing some >>>> software that will not be used immediately). >>>> >>>> >>>> >>>> On Mon, Oct 11, 2021 at 7:53 PM Robert Burke <[email protected]> >>>> wrote: >>>> >>>>> SGTM. Thank you very much Daniel! >>>>> >>>>> On Mon, Oct 11, 2021, 7:51 PM Ahmet Altay <[email protected]> wrote: >>>>> >>>>>> Thank you Daniel. Could you please update the wiki once you are done >>>>>> with the process? >>>>>> >>>>>> On Mon, Oct 11, 2021 at 6:22 PM Daniel Oliveira < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Took me a bit to get to this, sorry. I finally figured out an >>>>>>> approach for updating Go and did so and will be updating the image >>>>>>> momentarily. >>>>>>> >>>>>>> I think a more important note is that I tried what Valentyn was >>>>>>> considering, which is SSHing into workers and updating the dependency. >>>>>>> I'll >>>>>>> describe the process below, but the summary is that I did it on one >>>>>>> worker >>>>>>> with Go so far, saw no problems over the weekend, and would like to >>>>>>> continue updating the rest of the workers if there are no objections. >>>>>>> >>>>>>> Here's a step-by-step of what I did. If we decide to stick with this >>>>>>> approach, these instructions can be added to Confluence: >>>>>>> >>>>>>> 1. Go to the page for the Jenkins agent you want to update [1] and >>>>>>> click "Mark this node temporarily offline", leaving a reason such as >>>>>>> "Updating X dependency." >>>>>>> 2. Wait until there are no more tests running in that agent (under >>>>>>> "Build Executor Status" on the left of the page). >>>>>>> 3. SSH into the agent and perform the update. >>>>>>> 4. Mark the node as online again. >>>>>>> 5. Repeat for every worker. >>>>>>> >>>>>>> And these are some additional steps if you want to immediately run a >>>>>>> test suite to check that the update worked correctly. For example in my >>>>>>> case, I wanted to check against the Go Postcommit, and it was a good >>>>>>> thing >>>>>>> I did, because it actually failed the first time and I had to go back >>>>>>> in to >>>>>>> fix a small oversight I made. So doing this after you update your first >>>>>>> worker is probably a good idea before updating the rest: >>>>>>> >>>>>>> 1. Go to the page for the job you want to run (for example: [2]). >>>>>>> 2. Click "Configure" on the left menu. >>>>>>> 3. Find the checkmark "Restrict where this project can be run" and >>>>>>> change the restriction from "beam" to the specific name of the agent >>>>>>> (ex. >>>>>>> "apache-beam-jenkins-1"). >>>>>>> 4. Save and apply that change. >>>>>>> 5. Back on the page for the job, click "Build with Parameters" on >>>>>>> the left menu. >>>>>>> 6. Run the build on "master". >>>>>>> 7. Once you're done checking the results, change the restriction for >>>>>>> the job back to "beam". (This also gets reset once every 24 hours in >>>>>>> case >>>>>>> you forget.) >>>>>>> >>>>>>> I did that on one agent (apache-beam-jenkins-2) on Friday evening >>>>>>> when it wasn't too busy, and got Go updated and working. I checked that >>>>>>> agent's execution history again today just in case, and it was healthy >>>>>>> over >>>>>>> the weekend, with no Go-related problems as far as I could see. If >>>>>>> there's >>>>>>> no objections I'd like to go ahead and continue updating the rest of the >>>>>>> workers (I'll do this late at night or over the weekend to avoid >>>>>>> disrupting >>>>>>> dev work). >>>>>>> >>>>>>> [1] https://ci-beam.apache.org/computer/apache-beam-jenkins-1/ >>>>>>> [2] https://ci-beam.apache.org/job/beam_PostCommit_Go/ >>>>>>> >>>>>>> On Mon, Oct 4, 2021 at 6:14 PM Valentyn Tymofieiev < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I updated the image in [1], but did not change the workers yet to >>>>>>>> pick up the new image yet. We can do this once we add Go changes on >>>>>>>> top of >>>>>>>> it. >>>>>>>> >>>>>>>> I am also considering to SSH into every worker and run a one-line >>>>>>>> command that adds the dependency that was missing. It seems to be low >>>>>>>> risk, >>>>>>>> and there is a fall-back plan to re-start the worker using the saved >>>>>>>> image >>>>>>>> - both new and old images are saved and available in Cloud Console. >>>>>>>> >>>>>>>> Ideally, we should find a way to do a rolling upgrade that a PMC or >>>>>>>> committer could trigger without logging into every machine. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://issues.apache.org/jira/browse/BEAM-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424228#comment-17424228 >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 22, 2021 at 3:28 PM Daniel Oliveira < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> @Brian Hulette <[email protected]> That button seems like >>>>>>>>> exactly what we'd need. Doing it manually would be a pain, but it's >>>>>>>>> probably still preferable to causing a bunch of aborted tests. >>>>>>>>> >>>>>>>>> @Valentyn Tymofieiev <[email protected]> Collaborating to do >>>>>>>>> both updates at once is a great idea! I'll message you directly about >>>>>>>>> it. >>>>>>>>> >>>>>>>>> On Wed, Sep 22, 2021 at 2:44 PM Valentyn Tymofieiev < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I am also interested in this updating version of Python on VMs, I >>>>>>>>>> need to install Python 3.9. Thanks for looking into this. We can >>>>>>>>>> coordinate together to make one update instead of two. >>>>>>>>>> >>>>>>>>>> On Wed, Sep 22, 2021 at 2:40 PM Brian Hulette < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I'm not sure about best practices here. Out of curiosity I just >>>>>>>>>>> poked around in the Jenkins UI (e.g. [1]) and it looks like you can >>>>>>>>>>> manually "Mark node temporarily offline" when logged in (if you're a >>>>>>>>>>> committer). According to [2] this will prevent it from picking up >>>>>>>>>>> new jobs >>>>>>>>>>> after it's finished the currently executing ones. Doing that >>>>>>>>>>> manually for >>>>>>>>>>> every worker could be a pain though. >>>>>>>>>>> >>>>>>>>>>> Brian >>>>>>>>>>> >>>>>>>>>>> [1] https://ci-beam.apache.org/computer/apache-beam-jenkins-13/ >>>>>>>>>>> [2] >>>>>>>>>>> https://stackoverflow.com/questions/26553612/how-do-i-disable-a-node-in-jenkins-ui-after-it-has-completed-its-currently-runni >>>>>>>>>>> >>>>>>>>>>> On Wed, Sep 22, 2021 at 1:03 PM Daniel Oliveira < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey everyone, >>>>>>>>>>>> >>>>>>>>>>>> I'm aiming at upgrading the version of Go on our Jenkins VMs, >>>>>>>>>>>> and I found these instructions on upgrading software on Jenkins >>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips#JenkinsTips-HowtoinstallandupgradesoftwareonJenkinsworkers> >>>>>>>>>>>> on >>>>>>>>>>>> our cwiki. >>>>>>>>>>>> >>>>>>>>>>>> I haven't started going through it yet, but I was wondering >>>>>>>>>>>> about the last few steps that involve stopping VMs, deleting boot >>>>>>>>>>>> disks, >>>>>>>>>>>> and restarting executors. Is there some best practice for that >>>>>>>>>>>> section to >>>>>>>>>>>> avoid causing interruptions in our automated testing? Should I be >>>>>>>>>>>> trying to >>>>>>>>>>>> do this outside of peak dev hours, or going one VM at a time so >>>>>>>>>>>> others can >>>>>>>>>>>> pick up extra load, or anything like that? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Daniel Oliveira >>>>>>>>>>>> >>>>>>>>>>>
