CPT Community:
Here is a quick update on some changes we've made to stabilize
*cpantesters.org*. Preaction was able to get a handful of us SSH access
to the CPT servers. The "outages" and latency we've all noticed lately
have been related to Apache resource exhaustion on the CPT4 web server
that handles: www, api, reports, etc. We're still not 100% what is
causing Apache to go unresponsive but we wanted to let everyone know
we've made some improvements in the last week:
* Increased Apache listeners to facilitate higher levels of traffic
* Preaction reconfigured the Fastly CDN to remove a failed node
* 8GB swapfile was added to give the box some extra resources
* API.pm cleanup to fix a Mojolicious configuration artifact
* Robust monitoring tools installed to watch health in real time
Next we will need to investigate the health of the DB server as well. If
you have MySQL skills we could use your help diagnosing high load we're
seeing. We have a rather large MySQL DB (~1.5TB) that contains about 20
years of CPT history. The DB server gets ~37 queries per second and has
a sustained throughput of about 600Mb/s of read from the disks. From
what I can tell the DB server is healthy and functional, but it's
working /*hard*/.
If you have expertise in this field please reach out, or speak out at
PTC so we can not only stabilize CPT but improve it and take it to the
next level.
-- Scottchiefbaker