Great that you were able to figure out the root cause. Cheers.
On Wed, Apr 13, 2022, 08:20 Sifu Tian <[email protected]> wrote: > Hi Ash, > > I wanted to provide a follow-up response and say that I found the root > cause of the issue. > > What I discovered is that during each stage, depending on the scripts our > code is running, will spike in memory and CPU. These spikes could last a > few seconds, up to 30 sec. > Within our Kubernetes Elastic agent profiles YAML files, I defined memory > and CPU requests and limits. The reason I did this was when running > multiple pipelines, the pods would only saturate one node while the other > nodes stayed idle. > This was the cause of the Pods either hanging or Kubernetes killing them. > It was an oversight because I would continue to play with the CPU and Mem > allocation in the file with no improvement. > (e.g. memory 1Gi - 4Gi and CPU 1.0 - 4.0.). > Once I removed the specified memory and CPU in the YAML file, and allow > Kubernetes to handle the distribution, none of the pods died. I did notice > 1 or 2 pipelines that handle our very heavy cpu and memory build hang but I > can adjust different instances to accommodate the load on the cluster. > > [image: Screen Shot 2022-04-12 at 10.48.24 PM.png] > > On Tuesday, April 12, 2022 at 9:41:12 AM UTC-4 [email protected] wrote: > >> This behaviour of GoCD usually points to Agent process dying mid-way and >> GoCD automatically re-assign the work to another agent and they would start >> from scratch. Can you check the agent process logs for the earlier runs to >> see if there are any exceptions that might have caused the GoCD Server to >> reassign the process to another agent? >> >> Sometimes it could be the pipeline itself that's killing the agent >> process for a variety of reasons. >> >> On Tue, 12 Apr 2022 at 19:02, Sifu Tian <[email protected]> wrote: >> >>> [image: Screen Shot 2022-04-12 at 9.21.28 AM.png]Hi all, >>> >>> I have some unusual behavior that is happening on random pipelines. >>> When the pipeline runs, it will run fine but the job will get to a >>> certain point and start all over again pulling materials and running the >>> same task. The first task appears to hang or just stops and a new but same >>> job is run. The pipeline never fails it just continues to run and it will >>> spawn the same job over and over. On the K8 cluster status page, it will >>> only show one pod but in the console, it will show a new pod was issued. >>> >>> I am using the Kubernetes elastic agent plugin >>> GoCD Server and agent are at 22.1 >>> >>> Any thoughts or help would be greatly appreciated.[image: Screen Shot >>> 2022-04-12 at 9.23.20 AM.png] >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "go-cd" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/go-cd/3bf59b24-31f1-4445-be9e-a2ba6606d396n%40googlegroups.com >>> <https://groups.google.com/d/msgid/go-cd/3bf59b24-31f1-4445-be9e-a2ba6606d396n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> >> Ashwanth Kumar / ashwanthkumar.in >> >> -- > You received this message because you are subscribed to the Google Groups > "go-cd" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/go-cd/d3a5b605-130a-40d3-a70f-542a2878927dn%40googlegroups.com > <https://groups.google.com/d/msgid/go-cd/d3a5b605-130a-40d3-a70f-542a2878927dn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/go-cd/CAD9m7Cz_cSoOLSJXKJbM0GAwCuEwNJDzNEffHibu3Bfiykifiw%40mail.gmail.com.
