Re: [Dovecot] Director proxy timeout

2010-07-14 Thread Brandon Davidson



On 7/13/10 4:53 AM, Timo Sirainen t...@iki.fi wrote:
 
 Hmm. Between? Is it doing CAPABILITY before or after login or both? That
 anyway sounds different from the idle timeout problem..

I added some additional logging to imapproxy and it looks like it's actually
getting stuck in a few different commands. It just depends on what it's
trying to do when the connection gets wedged.

What I'm seeing is that from time to time an imapproxy - imap-login proxy
connection will get stuck and cease responding to commands. After a while
the PHP client will timeout and give up, after which the stuck connection
goes back to the pool, and continues to get reused and cause hangs until I
either restart imapproxy or kill off the imap-login proxy that the stuck
socket is connected to.

If I attach to the stuck imap-login process, it's waiting in:
#0  0x00385c0c6070 in __write_nocancel () from /lib64/libc.so.6
#1  0x003c5620c9a1 in login_proxy_state_notify () from
/usr/lib64/dovecot/libdovecot-login.so.0
#2  0x003c5620c026 in login_proxy_notify () from
/usr/lib64/dovecot/libdovecot-login.so.0
#3  0x003c55e52521 in io_loop_handle_timeouts_real () from
/usr/lib64/dovecot/libdovecot.so.0
#4  0x003c55e5257b in io_loop_handle_timeouts () from
/usr/lib64/dovecot/libdovecot.so.0
#5  0x003c55e5373c in io_loop_handler_run () from
/usr/lib64/dovecot/libdovecot.so.0
#6  0x003c55e525c1 in io_loop_run () from
/usr/lib64/dovecot/libdovecot.so.0
#7  0x003c55e3b896 in master_service_run () from
/usr/lib64/dovecot/libdovecot.so.0
#8  0x003c5620dc4b in main () from
/usr/lib64/dovecot/libdovecot-login.so.0
#9  0x00385c01d994 in __libc_start_main () from /lib64/libc.so.6
#10 0x00402019 in _start ()

If I tcpdump the stuck connection, I can see that imapproxy sends something
to the imap-login proxy when new clients are connected, but I'm not sure
what since it's SSL encrypted. The response is an empty ack packet. I'm
going to try disabling SSL between imapproxy and the director to see if I
can figure out what it's sending.

All in all I'm having a hard time debugging it since it only seems to happen
when there are a decent number of users active. I'm not at all convinced
that it's dovecot's fault, but if you have any suggestions or things that I
could to to see what the imap-login proxy or backend think is going on I'd
be much in your debt.

-Brad



Re: [Dovecot] Director proxy timeout

2010-07-14 Thread Timo Sirainen
On 14.7.2010, at 9.55, Brandon Davidson wrote:

 If I attach to the stuck imap-login process, it's waiting in:
 #0  0x00385c0c6070 in __write_nocancel () from /lib64/libc.so.6
 #1  0x003c5620c9a1 in login_proxy_state_notify () from
 /usr/lib64/dovecot/libdovecot-login.so.0

Oh, interesting. Login processes are notifying director every once in a while 
about connections. And looks like director isn't reading the replies, at least 
fast enough.. Did you check how long it was struck in this write? Forever?

I can easily change this write to be nonblocking and just retry later instead 
of hanging, but there is still a bug if director never reads it..



Re: [Dovecot] Director proxy timeout

2010-07-14 Thread Timo Sirainen
On Wed, 2010-07-14 at 12:30 +0100, Timo Sirainen wrote:
 On 14.7.2010, at 9.55, Brandon Davidson wrote:
 
  If I attach to the stuck imap-login process, it's waiting in:
  #0  0x00385c0c6070 in __write_nocancel () from /lib64/libc.so.6
  #1  0x003c5620c9a1 in login_proxy_state_notify () from
  /usr/lib64/dovecot/libdovecot-login.so.0
 
 Oh, interesting. Login processes are notifying director every once in a while 
 about connections. And looks like director isn't reading the replies, at 
 least fast enough.. Did you check how long it was struck in this write? 
 Forever?
 
 I can easily change this write to be nonblocking and just retry later instead 
 of hanging, but there is still a bug if director never reads it..

This should fix it: http://hg.dovecot.org/dovecot-2.0/rev/510b627687f8



Re: [Dovecot] Director proxy timeout

2010-07-14 Thread Brad Davidson
Timo,

 -Original Message-
 From: Timo Sirainen [mailto:t...@iki.fi]

 I can easily change this write to be nonblocking and just retry later
 instead of hanging, but there is still a bug if director never reads it..
 
 This should fix it: http://hg.dovecot.org/dovecot-2.0/rev/510b627687f8

So far so good! I upgraded the directors and switched the webmail system back 
over a few hours ago and it has yet to get stuck. So glad it was something 
fairly easy to fix, I was beginning to despair that it was an imapproxy problem 
that I'd have to track down and fix myself.

Makes sense that it was the notify socket, I had noticed that the 'doveadm 
director status' user numbers would drop off significantly after it had been 
running for a while, but I didn't know what to correlate that too.

-Brad


Re: [Dovecot] Director proxy timeout

2010-07-13 Thread Timo Sirainen
On 12.7.2010, at 22.13, Brad Davidson wrote:

 It does look like all of the Maximum execution time of 120 seconds exceeded 
 errors logged by Apache are within Roundcube's GetCapability readLine loop. 
 There have been problems in the past with Roundcube's IMAP code going off the 
 deep end if it loses its IMAP connection. I'll see if I can't do something 
 about that, but it would also be nice if I could figure out why I'm getting 
 disconnected between LOGIN and CAPABILITY in the first place.

Hmm. Between? Is it doing CAPABILITY before or after login or both? That 
anyway sounds different from the idle timeout problem..



Re: [Dovecot] Director proxy timeout

2010-07-12 Thread Timo Sirainen
On Sun, 2010-07-11 at 21:35 -0700, Brad Davidson wrote:

 Is the director proxy inactivity timeout configurable?

Proxy has no inactivity timeout.

 I just recently
 attempted to switch our production webmail's imapproxy system to use a
 pool of two directors and ran into problems with it apparently
 disconnecting them unexpectedly. I don't believe that it IDLEs, I think
 it just UNSELECTs to reset the state and then keeps the connection open
 until it's used again, or until a configurable delay has elapsed, after
 which it logs out on its own.

Maybe it's a firewall or something disconnecting them?



Re: [Dovecot] Director proxy timeout

2010-07-12 Thread Brad Davidson
Timo,

 -Original Message-
 From: Timo Sirainen [mailto:t...@iki.fi]

 Is the director proxy inactivity timeout configurable?
 
 Proxy has no inactivity timeout.
 Maybe it's a firewall or something disconnecting them?

That's odd. I'm trying to figure out what's going on. I didn't run in to any 
problems with it on my test box, but within a few hours of enabling it on the 
production system I had Apache processes backing up. It's rather hard to 
troubleshoot there are now so many moving parts.

It does look like all of the Maximum execution time of 120 seconds exceeded 
errors logged by Apache are within Roundcube's GetCapability readLine loop. 
There have been problems in the past with Roundcube's IMAP code going off the 
deep end if it loses its IMAP connection. I'll see if I can't do something 
about that, but it would also be nice if I could figure out why I'm getting 
disconnected between LOGIN and CAPABILITY in the first place.

-Brad


[Dovecot] Director proxy timeout

2010-07-11 Thread Brad Davidson
Timo,

Is the director proxy inactivity timeout configurable? I just recently
attempted to switch our production webmail's imapproxy system to use a
pool of two directors and ran into problems with it apparently
disconnecting them unexpectedly. I don't believe that it IDLEs, I think
it just UNSELECTs to reset the state and then keeps the connection open
until it's used again, or until a configurable delay has elapsed, after
which it logs out on its own.

I've got the imapproxy inactivity delay set to 300 seconds, and it seems
like the director is disconnecting them before that time is up. I
haven't had the same problem when imapproxy connects directly to the
backend servers, which are running 1.2.12.

On the webmail system, imapproxy is logging a LOT of:
Jul 11 21:10:21 cc-mailapps1 in.imapproxyd[28477]: IMAP_Line_Read():
connection closed prematurely.

The director shows:
Jul 11 21:10:14 cc-popmap7p dovecot: imap-login: proxy(jacintha):
disconnecting 172.25.142.164

The backend server shows:
Jul 11 21:10:14 cc-popmap2p dovecot: imap: user=jacintha,
rip=10.142.0.162, pid=8629: Connection closed bytes=855/46109 

Roundcube seems to handle the disconnects pretty badly, leaving a bunch
of Apache processes chewing up CPU time.

-Brad