Hello,

upgrading to 2.3.9 unfortunately does *not* solve this issue:

I upgraded one of my replicators from 2.3.7.2 to 2.3.9 and after some seconds replication stopped. The other replicator remained with 2.3.7.2. After downgrading to 2.3.7.2 replication is again working fine.

I did not try to upgrade both replicators up to now, as this is a live production system. Is there a chance, that upgrading both replicators will solve the problem?

The machines are running Ubuntu 18.04

Any help is appreciated.

Thanks,
Andreas

Am 18.10.19 um 13:52 schrieb Carsten Rosenberg via dovecot:
Hi,

some of our customers have discovered a replication issue after
upgraded from 2.3.7.2 to 2.3.8.

Running 2.3.8 several replication connections are hanging until defined
timeout. So after some seconds there are $replication_max_conns hanging
connections.
Other replications are running fast and successful.

Also running a doveadm sync tcp:... is working fine for all users.

I can't see exactly, but I haven't seen mailboxes timeouting again and
again. So I would assume it's not related to the mailbox.

 From the logs:

server1:
Oct 16 08:29:25 server1 dovecot[5715]:
dsync-local(userna...@domain.com)<FXnVDW22pl0tGAAA1cwDxA>: Error:
dsync(172.16.0.1): I/O has stalled, no activity for 600 seconds (version
not received)
Oct 16 08:29:25 server1 dovecot[5715]:
dsync-local(userna...@domain.com)<FXnVDW22pl0tGAAA1cwDxA>: Error:
Timeout during state=master_recv_handshake

server2:

Oct 16 08:29:25 server2 dovecot[8113]: doveadm: Error: read(server1)
failed: EOF (last sent=handshake, last recv=handshake)

There aren't any additional logs regarding the replication.

I have tried increasing vsz_limit or reducing replication_max_conns.
Nothing changed.

--

Both customers have 10k+ users. Currently I couldn't reproduce this on
smaller test systems.

Both installation were downgraded to 2.3.7.2 to fix the issue for now

--

I've attached a tcpdump showing the client showing the client stops
sending any data after the mailbox_guid table headers.



Any idea what could be wrong here or the debug this issue?

Thanks.

Carsten Rosenberg



--
________________________________________________________________________
Dr. Andreas Piper, Hochschulrechenzentrum der Philipps-Univ. Marburg
          Hans-Meerwein-Straße 6, 35032 Marburg, Germany
Phone: +49 6421 28-23521  Fax: -26994  E-Mail: pi...@hrz.uni-marburg.de

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to