Indeed we have a `db-replica-1` server, but it (a) is currently broken it looks 
like, and (b) was designed specifically for read-only backups using ZFS delta 
snapshotting. This is one of the things that ServerCentral set up for us that 
is _far beyond_ my knowledge/expertise.

Doug Bell
d...@preaction.me



> On Apr 28, 2025, at 10:44 AM, Scott Baker <sc...@perturb.org> wrote:
> 
> Karen:
> 
> That is one of the things on the table for discussion. However, the current 
> CPT DB is 1.5TB which makes backups and replication a little more 
> complicated. We're hoping to come up with a plan for the health and longevity 
> for CPT at the PTC summit this year.
> 
> -- Scottchiefbaker
> 
> On 4/27/25 5:52 PM, Karen Etheridge wrote:
>> You might want to consider running a second database that is a read-only 
>> replica, and pointing a separate instance of the cpantesters API at that, 
>> for serving the stats for metacpan -- that way any excessive db load from 
>> that query will not disrupt the remaining systems.
>> 
>> On Sun, Apr 27, 2025 at 5:49 PM Doug Bell <d...@preaction.me 
>> <mailto:d...@preaction.me>> wrote:
>>> MetaCPAN has a periodic sync it does, which is likely expensive, yeah. I 
>>> think I wrote in the ability to get statistics for reports submitted 
>>> "since" a certain date/time, and I think I remember that query being hard 
>>> to optimize. We might want to think about getting the Percona Monitoring 
>>> thing 
>>> <https://www.percona.com/software/database-tools/percona-monitoring-and-management/query-analytics>
>>>  going to get some query-level performance stats.
>>> 
>>> The cpantesters3 system being down, though, likely had a bunch of follow-on 
>>> effects: It was still in the Fastly proxy rotation for the API and Legacy 
>>> Metabase services. I've removed it from those services, so at the very 
>>> least Fastly won't forward traffic to a dead server.
>>> 
>>> The load on cpantesters4, though, is still less than 1. That's got me 
>>> thinking that CPU/memory aren't the bottleneck causing the current 
>>> problems...
>>> 
>>> The load on the primary db (db-primary-1.cpantesters.org 
>>> <http://db-primary-1.cpantesters.org/>) is hovering around 6 (with 16 
>>> cores). That might be causing at least some of the pain. Getting the PMM 
>>> dashboard up and moving the full text reports back out of the database will 
>>> probably do wonders for the load on the database server.
>>> 
>>> 
>>> Doug Bell
>>> d...@preaction.me <mailto:d...@preaction.me>
>>> 
>>> 
>>> 
>>>> On Apr 26, 2025, at 1:26 AM, Slaven Rezic <sla...@rezic.de 
>>>> <mailto:sla...@rezic.de>> wrote:
>>>> 
>>>> 25. 04. 2025. u 23:25, Scott Baker piše:
>>>> 
>>>>> It was brought up on IRC that one of the big consumers of the CPT API is 
>>>>> probably MetaCPAN. This may be contributing to some of the load issues 
>>>>> we're seeing.
>>>>> 
>>>>> Would it be possible to temporarily disable this traffic while CPT is 
>>>>> running in degraded mode?
>>>>> 
>>>>> At this point I'll do anything to get CPT stable again. The API has been 
>>>>> down for over 36 hours now and I really need to do some testing.
>>>>> 
>>>> I am not sure what's going on... however, cpantesters3 (which maybe is 
>>>> supposed to handle the API requests?) is down, and while cpantesters4 
>>>> looks like it has an internal api service, it seems like it never worked.
>>>> Regards,
>>>>     Slaven
>>>> 
>>> 

Reply via email to