What is a "workernode" in this context? This isn't GoCD terminology, so it's unclear what this means?
GoCD agents simply fork processes to run your tasks within the 'go' user context of the agent process. IIRC the entire "wrapping" environment from the agent process should be propagated to the tasks, so could be differences there depending on how you install and launch your agents. There's not really any magic here, and the server has no role (synchronously) once the agent knows what job needs to be run, and starts cloning/fetching materials and kicking off tasks. You can see what the agent is doing for each job/task in the console log to see where the time is being spent. If the agents are "static" and the jobs create mutable content locally (e.g virtualenvs or other such stuff) you also might want to consider whether you should enable "Clean working directory" on the stage level to ensure a clean state before your jobs' tasks run. Other than that, it seems likely to me that there is some kind of configuration at your host or OS user level (as Ketan hints at) that is affecting mitogen/ansible. Perhaps the way mitogen, ansible or python are installed, something different in the python environment, or some kind of different configuration that is applyied when run via the agent vs via directly on the node (ssh config? mitogen or ansible config?). I'd dump both env and tool config from within a GoCD task and compare between "good" and "bad" setups. There is likely *something* different there in how things are running. -Chad On Fri, May 5, 2023 at 2:07 PM 'Hans Dampf' via go-cd < [email protected]> wrote: > Ok did more testing and build a new setup from scratch. As expected, the > performance was very good. > Then we moved one of the old "broken" workernodes from the old setup to > the new setup and unexpectedly the performance was also very good again. > > So there seems to be some slowdown on the go-server side or with the > communication with the nodes. > > [email protected] schrieb am Donnerstag, 4. Mai 2023 um 12:06:13 > UTC+2: > >> > Is there maybe a cachefile or lockfile created by the agents which does >> not get deleted with a deinstallation? >> >> This might help find anything owned by the go user. >> >> $ sudo find / -user go >> >> - Ketan >> >> >> >> On Thu, May 4, 2023 at 3:16 PM 'Hans Dampf' via go-cd < >> [email protected]> wrote: >> >>> >>> It's not just one task, it's the whole playbook which is slower. >>> Local yes as user go. >>> This runs in a normal performance >>> go@host1:~$ ansible-playbook slowplaybook.yaml -i inventory >>> >>> On the same machine the same playbook but executed by the go-agent is >>> slow. >>> It ran fast in the past until the incident with the heavy load on the >>> agents and big backlog >>> 100% Usage of all 150 agents + 200 Jobs in the backlog. >>> Beside this there where no changes on the playbook or the settings of >>> the agents (env variables) >>> >>> Normaly we only use about 40-50 agents and no backlog >>> >>> Is there maybe a cachefile or lockfile created by the agents which does >>> not get deleted with a deinstallation? >>> >>> [email protected] schrieb am Donnerstag, 4. Mai 2023 um 10:43:29 >>> UTC+2: >>> >>>> It's unclear from your problem description if the entire job is taking >>>> 10-30 minutes, or the task is taking 10-30 minutes. You mention that >>>> running locally from the agent is quick — it is unclear if you're running >>>> your task as `go` user or `root` user. For context, there are other >>>> overheads in jobs that include for example — checking out code, cleaning >>>> the working directory (if configured to do so). At the end of all tasks, >>>> the agent will also upload all artifacts/console logs back to the gocd >>>> server. >>>> >>>> If I were in your place, I would do the following next steps: >>>> >>>> - See if the script can be run in quiet mode. Maybe redirect the output >>>> to /dev/null, if possible and check how long it takes to run just >>>> ansible+mitogen. This is to eliminate possible issues or slowness with gocd >>>> taking time to "read" the output from your deployment. >>>> - Next — turn on more debug/verbose output in ansible + mitogen to see >>>> if there are things that the gocd agent might be doing that could be >>>> affecting your deploy timings. For e.g — any spurious environment >>>> variables, that gocd might be setting, or perhaps some SSH configs that >>>> might be affecting the deployment. >>>> - Run the `env` command before your job — to dump any environment >>>> variables that are applicable for that job. You can then `export` these >>>> environment variables from the shell (as `go` user) — and then run the >>>> script to see if there is any difference. >>>> >>>> - Ketan >>>> >>>> >>>> >>>> On Thu, May 4, 2023 at 2:03 PM 'Hans Dampf' via go-cd < >>>> [email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> our setup consists of 10 worker with 15 agents each. We run ansible + >>>>> mitogen on the agents. Currently, we have a problem with the go-agent + >>>>> mitogen. >>>>> >>>>> Mitogen itself is a tool to speedup ansible runs by "tunneling" >>>>> multiple tasks over one ssh connection. >>>>> https://mitogen.networkgenomics.com/ansible_detailed.html >>>>> >>>>> If we use i on the worker without the agent directly on the cli it >>>>> runs very well >>>>> >>>>> Basic Ansible: ~ 5min >>>>> Ansible + Mitogen: ~ 1.5 min >>>>> Ansible + Mitogen + Go-agent (expected): ~2 min >>>>> Ansible + Mitogen + Go-agent (currently): ~ 10 - 30 min >>>>> >>>>> Now, if we start ansible with mitogen enabled IN the go-agent, the >>>>> runtime is significant longer than the basic run. >>>>> Some runs can slow down to 10 - 30 min is highly unusual since it >>>>> should only take 2 - 5 min. Run directly on the cli it's fast as expected. >>>>> >>>>> Strangely, this was not from the beginning. This is only after because >>>>> of an incident we had to stress all 150 agents at once. >>>>> >>>>> We already reinstalled ansible, mitogen and the go-agent itself, but >>>>> the degraded performance persists. >>>>> >>>>> I hope somebody can help in how further debug this, since the last >>>>> resort would be to complete reinstall the whole workernodes. >>>>> >>>>> Regards >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "go-cd" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/go-cd/2464860e-407e-4be6-ae6c-3db0c68a7d95n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/go-cd/2464860e-407e-4be6-ae6c-3db0c68a7d95n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "go-cd" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/go-cd/3c454b1b-e931-4a45-bec5-810fe4478d82n%40googlegroups.com >>> <https://groups.google.com/d/msgid/go-cd/3c454b1b-e931-4a45-bec5-810fe4478d82n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "go-cd" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/go-cd/5b9bf950-3c9d-436c-be48-24ecfef342dfn%40googlegroups.com > <https://groups.google.com/d/msgid/go-cd/5b9bf950-3c9d-436c-be48-24ecfef342dfn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CAA1RwH-3vs43r3wUB1MgLErk3wzUxG4gtNBcXsSxC3QHuye7_Q%40mail.gmail.com.
