On 20 Apr 2015, at 19:10, Dennis Kuhn <[email protected]> wrote:
> 
> we have some replication issues. From time to time a doveadm-server
> process takes 100% cpu in the state recv_mailbox_tree_deletes on the
> replica. The process runs forever until it is manually killed. Strace on
> this process doesn't show anything.
> Sometimes we have several doveadm-server processes in this state, all
> for the same account, all with 100% CPU Load.

Some bug, but there would need to be a way to reproduce it or otherwise it's 
pretty much impossible to find what the bug is and get it fixed.

> My workaround is to delete the user directory on the replica  so that
> the whole account is replicated again. This solves the problem for this
> specific account.

So killing the doveadm-server process will cause it to hang again for the same 
user? That's good, since it means it can be reproduced by taking a copy of the 
mailboxes and trying to run "doveadm sync" manually on them locally, e.g.:

doveadm -D -o mail=mdbox:/tmp/mdbox1 sync mdbox:/tmp/mdbox2

Does that hang? If yes, we can get further with it. The -D parameter is also 
helpful here - v2.2.16 logs much more useful debug logging with dsync that can 
also help catch these kind of hangs. Even if you can't reproduce the hang the 
above way, having mail_debug=yes for dsync and getting the debug logs from a 
hanging session would be useful. (But it may also mean that a hang might start 
flooding your logs with debug messages and eat up all the disk space.)

Reply via email to