Hi Ash, I wanted to provide a follow-up response and say that I found the root cause of the issue.
What I discovered is that during each stage, depending on the scripts our code is running, will spike in memory and CPU. These spikes could last a few seconds, up to 30 sec. Within our Kubernetes Elastic agent profiles YAML files, I defined memory and CPU requests and limits. The reason I did this was when running multiple pipelines, the pods would only saturate one node while the other nodes stayed idle. This was the cause of the Pods either hanging or Kubernetes killing them. It was an oversight because I would continue to play with the CPU and Mem allocation in the file with no improvement. (e.g. memory 1Gi - 4Gi and CPU 1.0 - 4.0.). Once I removed the specified memory and CPU in the YAML file, and allow Kubernetes to handle the distribution, none of the pods died. I did notice 1 or 2 pipelines that handle our very heavy cpu and memory build hang but I can adjust different instances to accommodate the load on the cluster. [image: Screen Shot 2022-04-12 at 10.48.24 PM.png] On Tuesday, April 12, 2022 at 9:41:12 AM UTC-4 [email protected] wrote: > This behaviour of GoCD usually points to Agent process dying mid-way and > GoCD automatically re-assign the work to another agent and they would start > from scratch. Can you check the agent process logs for the earlier runs to > see if there are any exceptions that might have caused the GoCD Server to > reassign the process to another agent? > > Sometimes it could be the pipeline itself that's killing the agent process > for a variety of reasons. > > On Tue, 12 Apr 2022 at 19:02, Sifu Tian <[email protected]> wrote: > >> [image: Screen Shot 2022-04-12 at 9.21.28 AM.png]Hi all, >> >> I have some unusual behavior that is happening on random pipelines. >> When the pipeline runs, it will run fine but the job will get to a >> certain point and start all over again pulling materials and running the >> same task. The first task appears to hang or just stops and a new but same >> job is run. The pipeline never fails it just continues to run and it will >> spawn the same job over and over. On the K8 cluster status page, it will >> only show one pod but in the console, it will show a new pod was issued. >> >> I am using the Kubernetes elastic agent plugin >> GoCD Server and agent are at 22.1 >> >> Any thoughts or help would be greatly appreciated.[image: Screen Shot >> 2022-04-12 at 9.23.20 AM.png] >> >> -- >> You received this message because you are subscribed to the Google Groups >> "go-cd" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/go-cd/3bf59b24-31f1-4445-be9e-a2ba6606d396n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/go-cd/3bf59b24-31f1-4445-be9e-a2ba6606d396n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > Ashwanth Kumar / ashwanthkumar.in > > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/d3a5b605-130a-40d3-a70f-542a2878927dn%40googlegroups.com.
