On 18/11/2020 19:37, Aakash Patel wrote:
Hello,
I have two mail servers and am also experiencing sporadic replication
errors over tcps, similar to Reuben. Each server is running Dovecot
2.3.11.3 (502c39af9) on Debian 10.6.
*Log entries from MX1*
Nov 18 00:39:26 mx1 dovecot:
dsync-local([email protected])<Ow3zAjWxtF+TDgAAPHKnuQ>: Error:
dsync(mx2.example.com): I/O has stalled, no activity for 600 seconds
(last sent=mailbox, last recv=mailbox_state)
Nov 18 00:39:26 mx1 dovecot:
dsync-local([email protected])<Ow3zAjWxtF+TDgAAPHKnuQ>: Error: Timeout
during state=sync_mails (send=mailbox recv=mailbox)
Nov 18 06:39:32 mx1 dovecot:
dsync-local([email protected])<6bScGpwFtV+vEQAAPHKnuQ>: Error:
dsync(mx2.example.com): I/O has stalled, no activity for 600 seconds
(last sent=mailbox, last recv=mailbox_state)
Nov 18 06:39:32 mx1 dovecot:
dsync-local([email protected])<6bScGpwFtV+vEQAAPHKnuQ>: Error: Timeout
during state=sync_mails (send=mailbox recv=mailbox)
*End*
*Log entries from MX2*
Nov 18 00:29:55 mx2 dovecot:
dsync-local([email protected])<fKK3JzWxtF9zAgAA5XpYKg>: Error: Couldn't
lock /var/vmail/[email protected]/.dovecot-sync.lock:
fcntl(/var/vmail/[email protected]/.dovecot-sync.lock, write-lock,
F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held
by pid 628)
Nov 18 00:34:56 mx2 dovecot:
dsync-local([email protected])<9IKaB2KytF92AgAA5XpYKg>: Error: Couldn't
lock /var/vmail/[email protected]/.dovecot-sync.lock:
fcntl(/var/vmail/[email protected]/.dovecot-sync.lock, write-lock,
F_SETLKW) locking failed: Timed out after 30 seconds (WRITE lock held
by pid 628)
Nov 18 00:39:26 mx2 dovecot: doveadm: Error: dsync(mx1.example.com):
I/O has stalled, no activity for 600 seconds (last sent=mail_change
(EOL), last recv=mailbox)
Nov 18 06:39:32 mx2 dovecot: doveadm: Error: dsync(mx1.example.com):
I/O has stalled, no activity for 600 seconds (last sent=mail_change
(EOL), last recv=mailbox)
*End*
I have configured "replication_full_sync_interval = 1 hours", which
explains why some of the sync errors occur at the same increment on
the hour (if the error does occur).
I've tested replication over tcps using either IPv6 or IPv4 -- this
did not appear to make a difference.
Changing replication to occur over tcp solves the issue (with "ssl =
yes" commented out, as well).
IMAP clients are primarily connecting to MX1 using SSL, which works
well (SSL connections to MX2 also work). These are very low traffic
machines at the moment (just 1 user as I continue testing).
I've attached the output of "dovecot -n" from each server.
Are there known bugs with replication using SSL? I'd appreciate any
guidance.
Thank you,
AP
For what it's worth, I had the same issue when setting this up a few
weeks ago. I switched to using SSH based transport and it's been great
ever since. Is that an option for you?
dsync_remote_cmd = ssh -l%{login} %{host} doveadm dsync-server -u%u
mail_replica = remote:[email protected]
Cheers
James