You might want to consider running a second database that is a read-only
replica, and pointing a separate instance of the cpantesters API at that,
for serving the stats for metacpan -- that way any excessive db load from
that query will not disrupt the remaining systems.

On Sun, Apr 27, 2025 at 5:49 PM Doug Bell <d...@preaction.me> wrote:

> MetaCPAN has a periodic sync it does, which is likely expensive, yeah. I
> think I wrote in the ability to get statistics for reports submitted
> "since" a certain date/time, and I think I remember that query being hard
> to optimize. We might want to think about getting the Percona Monitoring
> thing
> <https://www.percona.com/software/database-tools/percona-monitoring-and-management/query-analytics>
>  going
> to get some query-level performance stats.
>
> The cpantesters3 system being down, though, likely had a bunch of
> follow-on effects: It was still in the Fastly proxy rotation for the API
> and Legacy Metabase services. I've removed it from those services, so at
> the very least Fastly won't forward traffic to a dead server.
>
> The load on cpantesters4, though, is still less than 1. That's got me
> thinking that CPU/memory aren't the bottleneck causing the current
> problems...
>
> The load on the primary db (db-primary-1.cpantesters.org) is hovering
> around 6 (with 16 cores). That might be causing at least some of the pain.
> Getting the PMM dashboard up and moving the full text reports back out of
> the database will probably do wonders for the load on the database server.
>
>
> Doug Bell
> d...@preaction.me
>
>
>
> On Apr 26, 2025, at 1:26 AM, Slaven Rezic <sla...@rezic.de> wrote:
>
> 25. 04. 2025. u 23:25, Scott Baker piše:
>
> It was brought up on IRC that one of the big consumers of the CPT API is
> probably MetaCPAN. This may be contributing to some of the load issues
> we're seeing.
>
> Would it be possible to temporarily disable this traffic while CPT is
> running in degraded mode?
>
> At this point I'll do *anything *to get CPT stable again. The API has
> been down for over 36 hours now and I really need to do some testing.
>
> I am not sure what's going on... however, cpantesters3 (which maybe is
> supposed to handle the API requests?) is down, and while cpantesters4 looks
> like it has an internal api service, it seems like it never worked.
>
> Regards,
>     Slaven
>
>
>

Reply via email to