No, the test to bug out doesn't work because net_geterror(proxy->fd_ssl)
returns 0 in the statement
if (!proxy->client_proxy &&
net_geterror(proxy->fd_ssl) == ENOTCONN) {
However, errno is indeed ENOTCONN. Changing the test to
if (!proxy->client_proxy &&
errno == ENOTCONN) {
does do the job (i.e., stopping the process with the "syscall failed" error
message), though I'm sure I don't understand the ins and outs well enough to
know if that is an appropriate test. The full log line is:
Nov 29 07:39:14 SERVER dovecot[24362]: imap-login: Disconnected
(disconnected before greeting, waited 0 secs): user=<>, rip=CLIENTIP,
lip=SERVERIP, TLS handshaking: SSL_accept() syscall failed: Socket is not
connected, session=<d5DNFaTP8QAKAAFi>
Thoughts?
On Nov 28, 2012, at 10:18PM PST, Timo Sirainen <[email protected]> wrote:
> This is either OSX bug or OpenSSL bug.. Apparently what happens is:
>
> 1. Client sends SYN packet to Dovecot
> 2. Dovecot accept()s the connection (sends SYN-ACK) and goes into OpenSSL code
> 3. Client doesn't send ACK to Dovecot. Does it send RST or nothing or
> something else? I don't know.
> 4. OSX notices anyway that something is wrong with the socket, and kqueue
> says that the socket is ready for reading
> 5. OpenSSL read()s, which fails with ENOTCONN. But OpenSSL thinks this is a
> non-fatal error and simply asks to be notified again when something can be
> read
> 6. goto 4
>
> So, whose bug is it? OpenSSL's ENOTCONN handling probably makes sense for
> client connections where connect() hasn't finished yet. But then again, this
> is accept()ed connection where it typically should fail like that. Except I
> guess it might be correct behavior if read() is done after SYN-ACK but before
> receiving ACK.
>
> While OSX is receiving ACK from the client, it shouldn't say that the fd is
> readable. It probably doesn't. But after it receives <something> it realizes
> that the socket is disconnected. So read() probably shouldn't be returning
> ENOTCONN anymore at this point, but instead ECONNRESET or ETIMEDOUT.
>
> See if the attached patch helps.
>
>
> On 29.11.2012, at 7.45, Erik A Johnson wrote:
>> Here's the log:
>>
>> Nov 28 21:28:11 macbookpro-e17d.home dovecot[54139]: master: Dovecot v2.1.10
>> starting up (core dumps disabled)
>> Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug:
>> ssl_step()
>> Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug:
>> ssl_handshake: SSL_accept()=-1
>> Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug:
>> SSL_get_error() = 2
>> Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug: -
>> want_read
>> Nov 28 21:30:19 macbookpro-e17d.home dovecot[54141]: imap-login: Debug:
>> ssl_set_io(0)
>> [last 5 lines are repeated until process is killed]
>>
>> On Nov 26, 2012, as 11:38PM PST, Timo Sirainen <[email protected]> wrote:
>>>
>>> Could you try with the attached patch, and with only the problematic
>>> client running? What does it log (the beginning of the session until it
>>> starts repeating the same lines)?
>>>
>>> On 10.11.2012, at 12.44, Erik A Johnson wrote:
>>>> imap-login processes are hanging (using 100% of CPU) when connected from a
>>>> client that is partially blocked by a firewall. It appears that
>>>> imap-login is stuck in a loop trying to complete an ssl handshake.
>>>> imap-login is working fine for other clients not blocked by the firewall
>>>> (including localhost).
>>>>
>>>> This is dovecot 2.1.10 under Mac OS X 10.8.2 (compiled from sources); the
>>>> firewall is Little Snitch 3.0.1 blocking port 993, which appears to let
>>>> the connection initiate but then squashes and disconnects the socket
>>>> during ssl handshaking.
>>>>
>>>> gdb backtrace and Activity Monitor's "Sample Process" show that imap-login
>>>> is stuck calling ioloop-kqueue's io_loop_handler_run -> io_loop_call_io ->
>>>> ssl_step repeatedly; dtruss shows that it is repeatedly making system
>>>> calls to kevent and read, the latter returning -1 with errno
>>>> 57=ENOTCONN="Socket is not connected". (I also tried ./configure
>>>> --with-ioloop=poll and --with-iopoll=select instead of the default best =
>>>> kqueue but the results were the same; --with-iopoll=epoll didn't work
>>>> because epoll is not available on this machine.) The client, initiated by
>>>> the command "openssl s_client -connect SERVER:993", first responds
>>>> "CONNECTED(00000003)" but then immediately the error
>>>> "60278:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake
>>>> failure:/SourceCache/OpenSSL098/OpenSSL098-44/src/ssl/s23_lib.c:182:".
>>>> The infinite loop is in src/lib/ioloop.c in the function "io_loop_run"
>>>> where the statement "while (ioloop->running) io_loop_handler_run(ioloop)"
>>>> is executed.
>
>