Ok did more testing and build a new setup from scratch. As expected, the performance was very good. Then we moved one of the old "broken" workernodes from the old setup to the new setup and unexpectedly the performance was also very good again.
So there seems to be some slowdown on the go-server side or with the communication with the nodes. [email protected] schrieb am Donnerstag, 4. Mai 2023 um 12:06:13 UTC+2: > > Is there maybe a cachefile or lockfile created by the agents which does > not get deleted with a deinstallation? > > This might help find anything owned by the go user. > > $ sudo find / -user go > > - Ketan > > > > On Thu, May 4, 2023 at 3:16 PM 'Hans Dampf' via go-cd < > [email protected]> wrote: > >> >> It's not just one task, it's the whole playbook which is slower. >> Local yes as user go. >> This runs in a normal performance >> go@host1:~$ ansible-playbook slowplaybook.yaml -i inventory >> >> On the same machine the same playbook but executed by the go-agent is >> slow. >> It ran fast in the past until the incident with the heavy load on the >> agents and big backlog >> 100% Usage of all 150 agents + 200 Jobs in the backlog. >> Beside this there where no changes on the playbook or the settings of the >> agents (env variables) >> >> Normaly we only use about 40-50 agents and no backlog >> >> Is there maybe a cachefile or lockfile created by the agents which does >> not get deleted with a deinstallation? >> >> [email protected] schrieb am Donnerstag, 4. Mai 2023 um 10:43:29 >> UTC+2: >> >>> It's unclear from your problem description if the entire job is taking >>> 10-30 minutes, or the task is taking 10-30 minutes. You mention that >>> running locally from the agent is quick — it is unclear if you're running >>> your task as `go` user or `root` user. For context, there are other >>> overheads in jobs that include for example — checking out code, cleaning >>> the working directory (if configured to do so). At the end of all tasks, >>> the agent will also upload all artifacts/console logs back to the gocd >>> server. >>> >>> If I were in your place, I would do the following next steps: >>> >>> - See if the script can be run in quiet mode. Maybe redirect the output >>> to /dev/null, if possible and check how long it takes to run just >>> ansible+mitogen. This is to eliminate possible issues or slowness with gocd >>> taking time to "read" the output from your deployment. >>> - Next — turn on more debug/verbose output in ansible + mitogen to see >>> if there are things that the gocd agent might be doing that could be >>> affecting your deploy timings. For e.g — any spurious environment >>> variables, that gocd might be setting, or perhaps some SSH configs that >>> might be affecting the deployment. >>> - Run the `env` command before your job — to dump any environment >>> variables that are applicable for that job. You can then `export` these >>> environment variables from the shell (as `go` user) — and then run the >>> script to see if there is any difference. >>> >>> - Ketan >>> >>> >>> >>> On Thu, May 4, 2023 at 2:03 PM 'Hans Dampf' via go-cd < >>> [email protected]> wrote: >>> >>>> Hello, >>>> >>>> our setup consists of 10 worker with 15 agents each. We run ansible + >>>> mitogen on the agents. Currently, we have a problem with the go-agent + >>>> mitogen. >>>> >>>> Mitogen itself is a tool to speedup ansible runs by "tunneling" >>>> multiple tasks over one ssh connection. >>>> https://mitogen.networkgenomics.com/ansible_detailed.html >>>> >>>> If we use i on the worker without the agent directly on the cli it runs >>>> very well >>>> >>>> Basic Ansible: ~ 5min >>>> Ansible + Mitogen: ~ 1.5 min >>>> Ansible + Mitogen + Go-agent (expected): ~2 min >>>> Ansible + Mitogen + Go-agent (currently): ~ 10 - 30 min >>>> >>>> Now, if we start ansible with mitogen enabled IN the go-agent, the >>>> runtime is significant longer than the basic run. >>>> Some runs can slow down to 10 - 30 min is highly unusual since it >>>> should only take 2 - 5 min. Run directly on the cli it's fast as expected. >>>> >>>> Strangely, this was not from the beginning. This is only after because >>>> of an incident we had to stress all 150 agents at once. >>>> >>>> We already reinstalled ansible, mitogen and the go-agent itself, but >>>> the degraded performance persists. >>>> >>>> I hope somebody can help in how further debug this, since the last >>>> resort would be to complete reinstall the whole workernodes. >>>> >>>> Regards >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "go-cd" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/go-cd/2464860e-407e-4be6-ae6c-3db0c68a7d95n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/go-cd/2464860e-407e-4be6-ae6c-3db0c68a7d95n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "go-cd" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/go-cd/3c454b1b-e931-4a45-bec5-810fe4478d82n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/go-cd/3c454b1b-e931-4a45-bec5-810fe4478d82n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/5b9bf950-3c9d-436c-be48-24ecfef342dfn%40googlegroups.com.
