Indeed we have a `db-replica-1` server, but it (a) is currently broken it looks like, and (b) was designed specifically for read-only backups using ZFS delta snapshotting. This is one of the things that ServerCentral set up for us that is _far beyond_ my knowledge/expertise.
Doug Bell d...@preaction.me > On Apr 28, 2025, at 10:44 AM, Scott Baker <sc...@perturb.org> wrote: > > Karen: > > That is one of the things on the table for discussion. However, the current > CPT DB is 1.5TB which makes backups and replication a little more > complicated. We're hoping to come up with a plan for the health and longevity > for CPT at the PTC summit this year. > > -- Scottchiefbaker > > On 4/27/25 5:52 PM, Karen Etheridge wrote: >> You might want to consider running a second database that is a read-only >> replica, and pointing a separate instance of the cpantesters API at that, >> for serving the stats for metacpan -- that way any excessive db load from >> that query will not disrupt the remaining systems. >> >> On Sun, Apr 27, 2025 at 5:49 PM Doug Bell <d...@preaction.me >> <mailto:d...@preaction.me>> wrote: >>> MetaCPAN has a periodic sync it does, which is likely expensive, yeah. I >>> think I wrote in the ability to get statistics for reports submitted >>> "since" a certain date/time, and I think I remember that query being hard >>> to optimize. We might want to think about getting the Percona Monitoring >>> thing >>> <https://www.percona.com/software/database-tools/percona-monitoring-and-management/query-analytics> >>> going to get some query-level performance stats. >>> >>> The cpantesters3 system being down, though, likely had a bunch of follow-on >>> effects: It was still in the Fastly proxy rotation for the API and Legacy >>> Metabase services. I've removed it from those services, so at the very >>> least Fastly won't forward traffic to a dead server. >>> >>> The load on cpantesters4, though, is still less than 1. That's got me >>> thinking that CPU/memory aren't the bottleneck causing the current >>> problems... >>> >>> The load on the primary db (db-primary-1.cpantesters.org >>> <http://db-primary-1.cpantesters.org/>) is hovering around 6 (with 16 >>> cores). That might be causing at least some of the pain. Getting the PMM >>> dashboard up and moving the full text reports back out of the database will >>> probably do wonders for the load on the database server. >>> >>> >>> Doug Bell >>> d...@preaction.me <mailto:d...@preaction.me> >>> >>> >>> >>>> On Apr 26, 2025, at 1:26 AM, Slaven Rezic <sla...@rezic.de >>>> <mailto:sla...@rezic.de>> wrote: >>>> >>>> 25. 04. 2025. u 23:25, Scott Baker piše: >>>> >>>>> It was brought up on IRC that one of the big consumers of the CPT API is >>>>> probably MetaCPAN. This may be contributing to some of the load issues >>>>> we're seeing. >>>>> >>>>> Would it be possible to temporarily disable this traffic while CPT is >>>>> running in degraded mode? >>>>> >>>>> At this point I'll do anything to get CPT stable again. The API has been >>>>> down for over 36 hours now and I really need to do some testing. >>>>> >>>> I am not sure what's going on... however, cpantesters3 (which maybe is >>>> supposed to handle the API requests?) is down, and while cpantesters4 >>>> looks like it has an internal api service, it seems like it never worked. >>>> Regards, >>>> Slaven >>>> >>>