Re: 3.2.6 -> 3.6.0 replication, restarting after sync_shutdown_file

AndrewHardy via Info Sun, 29 Jan 2023 22:58:26 -0800

Hi,

I tackle this problem in the following way:

My setup:
   1. One master site/datacenter, handles all connections imaps and web based 
dav stuff.
   2. Two standby / backup datacentres using sync client replication. 

On master, imapd.conf I have per datacentre replication configurations. 
Basically ensuring a replica exists on both slave datacentres. 

I have a custom python script that regularly checks running processes to ensure 
‘sync_client -r -n {namedefinedinimap.conf}} is running. (You could use a 
dtandard bash script, entirely up to you).

If the per data-centre process isn't running, force it to run using a script 
and periodic execution using cron. If its running, do nothing. By doing this, 
even if master for some reason crashes the sync tasks, you can use a 
independent script and execute it to what ever cron schedule you’re comfortable 
with.

I also take it one step forward and use my ansible/automation node to sanity 
check that both the sync client health checker script is running  and executing 
successfully and also do some magical stuff to ensure that the number of emails 
per mailbox on the master compared to those on the slaves are within a defined 
tolerance (check cyr_adm options,
It can output some cool metrics and with some parsing can do some cool stuff 
with sanity checking consistency between master/slave nodes). The key here is 
tolerance/threshold because with this approach the master and slaves will 
likely always be out by at least a few (depending on how busy the server is and 
the delay taken to run the health check on master until slave run time). If the 
count exceeds what I’d expect, I tell the playbook to also email me (do all the 
comparison stuff on the automation server). A watch the watcher of sorts.  Its 
not the most graceful solution on the planet but it has served me well over the 
past few years.

Perhaps this might be desirable if you don’t want to cause outages to users / 
restarting master is a bit aggressive. 

There may be better solutions but this is one that works for me.

-A 

On 30/01/2023, at 5:51 PM, Deborah Pickett via Info <[email protected]> 
wrote:
> 
> Hi all,
> 
> A few random notes that might help future readers searching the list 
> archives, and a question at the end:
> 
> After one struggle too many with the XBACKUP feature, I've bitten the bullet 
> and switched to rolling replication to do my Cyrus backups. (So I won't be 
> raising any more issues about XBACKUP; thanks Ellie for all the prior help 
> with that.)
> 
> I'm now replicating from the main server (Debian 10 buster-backports, Cyrus 
> 3.2.6) to a backup server at another site (Debian 11 bullseye-backports, 
> Cyrus 3.6.0) within our VPN.
> 
> The Debian 10 buster-backports package is currently at 3.2.6, which would 
> normally be not recent enough to safely replicate or upgrade to 3.6.0, but 
> there's an explicit patch at 
> https://sources.debian.org/src/cyrus-imapd/3.2.6-2%2Bdeb11u2/debian/patches/prepare-3.6-upgrade.patch/
>  which ensures that the Debian package version applies a uniqueid to every 
> mailbox. I ran a manual check on the 3.2.6 server and confirmed that every 
> folder has a uniqueid and the minor version is 16. Nice!
> 
> Replication over bare IMAP runs perfectly. I couldn't get replication to 
> happen over IMAPS. I've got a Let's Encrypt certificate installed on the 
> replica, and it's installed and working, tested with imtest. But even 
> changing the sync_host configuration to "remote_host_fqdn:993/tls", which has 
> been reported to work by some users over the years, produced a TLS library 
> error ("Unable to get local issuer certificate") which I am guessing is 
> because sync_client can't see the root CA file. I didn't try any harder to 
> make this work; if I think that our VPN backbone is at risk then I'll put an 
> SSH tunnel in.
> 
> My backup plan now is to shut down cyr_master on the replica periodically, 
> take a filesystem snapshot offsite, and start it up again. The master (a live 
> server connected to by users) will pause replication while the replica is 
> offline, and resume when it comes back online. At least, that's the plan, but 
> I've found that if I just `systemctl stop cyrus-imapd` on the replica, then 
> sync_client on the master logs errors like:
>   cyrus/sync_client[35180]: Error in do_sync(): bailing out! Bad protocol
> and doesn't resume after I restart the replica. I end up having to `systemctl 
> restart cyrus-imapd` on the master, which resumes synchronization but results 
> in downtime for users.
> 
> So I've now got a line in /etc/imapd.conf:
>   MyChannelName_sync_shutdown_file: /var/lib/cyrus/sync/MyChannelName/shutdown
> and touching that file indeed causes sync_client to shut down gracefully, but 
> I don't know how to inform cyr_master to restart sync_client again, short of 
> `systemctl restart cyrus-imapd`, which again results in downtime for users. 
> Sending a SIGHUP doesn't seem to do anything. My sync_client entry is in the 
> cyrus.conf STARTUP section. If I moved it to the DAEMON section, it might 
> start up again too soon, so I don't want that.
> 
> Does anyone have a replication-as-backup methodology that avoids sync_client 
> crashing, keeps it offline while the replica is backing up, starts it up 
> again when the backup completes, doesn't stop the master from processing user 
> requests, and avoids race conditions? Thanks in advance.
> 
> --
> Deborah Pickett
> System Administrator
> Polyfoam Australia Pty Ltd

------------------------------------------
Cyrus: Info
Permalink: 
https://cyrus.topicbox.com/groups/info/T2846d85f9a3f91b8-Mc6bcc82c9748f8ddf9bbb1ec
Delivery options: https://cyrus.topicbox.com/groups/info/subscription

Re: 3.2.6 -> 3.6.0 replication, restarting after sync_shutdown_file

Reply via email to