I've found that mbsync occasionally "gets stuck", not doing any work as
far as I can see from cpu and network activity and not exiting until I kill
it some number of hours after it becomes stuck. "Occasionally" is once every
few weeks to months. This is a problem for me because a daemon runs mbsync
in the background and waits for mbsync to finish before running mbsync the
next time, so mail syncing stops until I notice the problem and kill mbsync.

I've now caught a particular hang instance with mbsync compiled with debug
symbols and so finally have a backtrace, which I hope will be insightful?
Here's the console output from the mbsync run (CVS HEAD from 2008-09-01):

$ mbsync frostnet:INBOX frostnet:sent-mail
Reading configuration file /home/chris/.mbsyncrc
Channel frostnet
Opening master frostnet...
Resolving mail.frostnet.net... ok
Connecting to 216.151.149.52:993... ok
Connection is now encrypted
Logging in...
Opening slave local...
Selecting master INBOX...

Ten minutes after mbsync started I noticed the above. At that point there
was no active network traffic. The backtrace at that point:

#0  0xb7f3d410 in __kernel_vsyscall ()
#1  0xb7bd4273 in __read_nocancel () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7d10d37 in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#3  0xb7d0edb1 in BIO_read () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#4  0xb7dd4362 in ssl3_read_n () from /usr/lib/i686/cmov/libssl.so.0.9.8
#5  0xb7dd4b2e in ssl3_read_bytes () from /usr/lib/i686/cmov/libssl.so.0.9.8
#6  0xb7dd2096 in ssl3_read () from /usr/lib/i686/cmov/libssl.so.0.9.8
#7  0xb7de2b78 in SSL_read () from /usr/lib/i686/cmov/libssl.so.0.9.8
#8  0x080517b3 in socket_read (sock=0x805d404, 
    buf=0x212 <Address 0x212 out of bounds>, len=134677435) at drv_imap.c:387
#9  0x080518a6 in buffer_gets (b=0x805d404, s=0xbf9c3718) at drv_imap.c:469
#10 0x08051f7b in get_cmd_result (ctx=0x805d398, tcmd=0x80a71c8)
    at drv_imap.c:1000
#11 0x0805321d in imap_exec_b (ctx=0x805d398, cmdp=0x212, 
    fmt=0x805a394 "UID FETCH %d:%d (UID%s%s)") at drv_imap.c:594
#12 0x080537b5 in imap_select (gctx=0x805d398, minuid=1, maxuid=1000000000, 
    excs=0x0, nexcs=0, cb=0x804f360 <box_selected>, aux=0x8078a60)
    at drv_imap.c:1577
#13 0x0804c16e in select_box (svars=0x8078a60, t=0, minwuid=1, mexcs=0x0, 
    nmexcs=0) at sync.c:846
#14 0x0804cf98 in sync_boxes (ctx=0xbf9c3dc4, names=0xbf9c3ddc, 
    chan=0x805d2f0, cb=0x804abe0 <done_sync>, aux=0xbf9c3db0) at sync.c:820
#15 0x0804a79f in sync_chans (mvars=0xbf9c3db0, ent=530) at main.c:605
#16 0x0804b3cc in main (argc=3, argv=0xbf9c3ec4) at main.c:481

This is on an ubuntu 8.04 i386 machine (openssl 0.9.8).

I believe I have seen variations on the above console output when mbsync
appeared to be in a stuck state. For example, I know that at least once the
last line instead read:
        Selecting master INBOX... Timeout, server not responding.
This apparent hang was also in SSL_read()..__kernel_vsyscall(), but I
cannot compare the mbsync code stack because that mbsync run was with debug
symbols.

I remember that when I was investigating the mbsync code some time I was
concerned about the block reads, but did not get far enough into how openssl
works to know whether blocking reads could cause hangs.


thanks in advance!
-- 
Chris Frost  |  http://www.frostnet.net/chris/
-------------+--------------------------------
PGP: http://www.frostnet.net/chris/pgpkey.txt

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
isync-devel mailing list
isync-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/isync-devel

Reply via email to