Karen:
That is one of the things on the table for discussion. However, the
current CPT DB is 1.5TB which makes backups and replication a little
more complicated. We're hoping to come up with a plan for the health and
longevity for CPT at the PTC summit this year.
-- Scottchiefbaker
On 4/27/25 5:52 PM, Karen Etheridge wrote:
You might want to consider running a second database that is a
read-only replica, and pointing a separate instance of the cpantesters
API at that, for serving the stats for metacpan -- that way any
excessive db load from that query will not disrupt the remaining systems.
On Sun, Apr 27, 2025 at 5:49 PM Doug Bell <d...@preaction.me> wrote:
MetaCPAN has a periodic sync it does, which is likely expensive,
yeah. I think I wrote in the ability to get statistics for reports
submitted "since" a certain date/time, and I think I remember that
query being hard to optimize. We might want to think about getting
the Percona Monitoring thing
<https://www.percona.com/software/database-tools/percona-monitoring-and-management/query-analytics>
going
to get some query-level performance stats.
The cpantesters3 system being down, though, likely had a bunch of
follow-on effects: It was still in the Fastly proxy rotation for
the API and Legacy Metabase services. I've removed it from those
services, so at the very least Fastly won't forward traffic to a
dead server.
The load on cpantesters4, though, is still less than 1. That's got
me thinking that CPU/memory aren't the bottleneck causing the
current problems...
The load on the primary db (db-primary-1.cpantesters.org
<http://db-primary-1.cpantesters.org>) is hovering around 6 (with
16 cores). That might be causing at least some of the pain.
Getting the PMM dashboard up and moving the full text reports back
out of the database will probably do wonders for the load on the
database server.
Doug Bell
d...@preaction.me
On Apr 26, 2025, at 1:26 AM, Slaven Rezic <sla...@rezic.de> wrote:
25. 04. 2025. u 23:25, Scott Baker piše:
It was brought up on IRC that one of the big consumers of the
CPT API is probably MetaCPAN. This may be contributing to some
of the load issues we're seeing.
Would it be possible to temporarily disable this traffic while
CPT is running in degraded mode?
At this point I'll do *anything *to get CPT stable again. The
API has been down for over 36 hours now and I really need to do
some testing.
I am not sure what's going on... however, cpantesters3 (which
maybe is supposed to handle the API requests?) is down, and while
cpantesters4 looks like it has an internal api service, it seems
like it never worked.
Regards,
Slaven