You might want to consider running a second database that is a read-only replica, and pointing a separate instance of the cpantesters API at that, for serving the stats for metacpan -- that way any excessive db load from that query will not disrupt the remaining systems.
On Sun, Apr 27, 2025 at 5:49 PM Doug Bell <d...@preaction.me> wrote: > MetaCPAN has a periodic sync it does, which is likely expensive, yeah. I > think I wrote in the ability to get statistics for reports submitted > "since" a certain date/time, and I think I remember that query being hard > to optimize. We might want to think about getting the Percona Monitoring > thing > <https://www.percona.com/software/database-tools/percona-monitoring-and-management/query-analytics> > going > to get some query-level performance stats. > > The cpantesters3 system being down, though, likely had a bunch of > follow-on effects: It was still in the Fastly proxy rotation for the API > and Legacy Metabase services. I've removed it from those services, so at > the very least Fastly won't forward traffic to a dead server. > > The load on cpantesters4, though, is still less than 1. That's got me > thinking that CPU/memory aren't the bottleneck causing the current > problems... > > The load on the primary db (db-primary-1.cpantesters.org) is hovering > around 6 (with 16 cores). That might be causing at least some of the pain. > Getting the PMM dashboard up and moving the full text reports back out of > the database will probably do wonders for the load on the database server. > > > Doug Bell > d...@preaction.me > > > > On Apr 26, 2025, at 1:26 AM, Slaven Rezic <sla...@rezic.de> wrote: > > 25. 04. 2025. u 23:25, Scott Baker piše: > > It was brought up on IRC that one of the big consumers of the CPT API is > probably MetaCPAN. This may be contributing to some of the load issues > we're seeing. > > Would it be possible to temporarily disable this traffic while CPT is > running in degraded mode? > > At this point I'll do *anything *to get CPT stable again. The API has > been down for over 36 hours now and I really need to do some testing. > > I am not sure what's going on... however, cpantesters3 (which maybe is > supposed to handle the API requests?) is down, and while cpantesters4 looks > like it has an internal api service, it seems like it never worked. > > Regards, > Slaven > > >