Karen:

That is one of the things on the table for discussion. However, the current CPT DB is 1.5TB which makes backups and replication a little more complicated. We're hoping to come up with a plan for the health and longevity for CPT at the PTC summit this year.

-- Scottchiefbaker

On 4/27/25 5:52 PM, Karen Etheridge wrote:
You might want to consider running a second database that is a read-only replica, and pointing a separate instance of the cpantesters API at that, for serving the stats for metacpan -- that way any excessive db load from that query will not disrupt the remaining systems.

On Sun, Apr 27, 2025 at 5:49 PM Doug Bell <d...@preaction.me> wrote:

    MetaCPAN has a periodic sync it does, which is likely expensive,
    yeah. I think I wrote in the ability to get statistics for reports
    submitted "since" a certain date/time, and I think I remember that
    query being hard to optimize. We might want to think about getting
    the Percona Monitoring thing
    
<https://www.percona.com/software/database-tools/percona-monitoring-and-management/query-analytics>
 going
    to get some query-level performance stats.

    The cpantesters3 system being down, though, likely had a bunch of
    follow-on effects: It was still in the Fastly proxy rotation for
    the API and Legacy Metabase services. I've removed it from those
    services, so at the very least Fastly won't forward traffic to a
    dead server.

    The load on cpantesters4, though, is still less than 1. That's got
    me thinking that CPU/memory aren't the bottleneck causing the
    current problems...

    The load on the primary db (db-primary-1.cpantesters.org
    <http://db-primary-1.cpantesters.org>) is hovering around 6 (with
    16 cores). That might be causing at least some of the pain.
    Getting the PMM dashboard up and moving the full text reports back
    out of the database will probably do wonders for the load on the
    database server.


    Doug Bell
    d...@preaction.me



    On Apr 26, 2025, at 1:26 AM, Slaven Rezic <sla...@rezic.de> wrote:

    25. 04. 2025. u 23:25, Scott Baker piše:

    It was brought up on IRC that one of the big consumers of the
    CPT API is probably MetaCPAN. This may be contributing to some
    of the load issues we're seeing.

    Would it be possible to temporarily disable this traffic while
    CPT is running in degraded mode?

    At this point I'll do *anything *to get CPT stable again. The
    API has been down for over 36 hours now and I really need to do
    some testing.

    I am not sure what's going on... however, cpantesters3 (which
    maybe is supposed to handle the API requests?) is down, and while
    cpantesters4 looks like it has an internal api service, it seems
    like it never worked.

    Regards,
        Slaven

Reply via email to