Re: SSL Handshake errors

Andrei Marinescu Mon, 08 Jul 2013 10:43:33 -0700

I finally managed to track down the issue, the cause was much simpler than I had thought.

As I've mentioned before, the service exposed through this HAProxy instance is mainly accessed by mobile devices. The errors appeared when apps where closed (either manually or because of a crash) when a HTTPS connection was being established (we're doing a final API call when the app is being closed, for example). I've managed to replicate this behavior reliably. This also triggered some BADREQ errors, if the SSL connection was established but no data was ever sent.

The reason that we didn't detect this earlier was that AWS ELB didn't offer any logging, and the CloudWatch metrics where for HTTP return statuses (2XX, 4XX&5XX). Of course, these cases didn't trigger any of those.

Thanks again to everyone for your support!

Best,

Andrei

Emeric Brun

July 8, 2013 12:29 PM

On 07/08/2013 11:06 AM, Andrei Marinescu wrote:
Hi Lukas,

Unfortunately I'm not able to reproduce this on any of the devices I
have access to, I'm just seeing these erros in the logs and I'm trying
to track down the issue. I guess I'll try to find an easy to reproduce
scenario and return with a cap file at that time.

Just so that I can delete one possibility from my list, is it possible
that some devices reject the certificate I'm using? I'm thinking of this
because I ran into an issue with this CA on another server (a payment
gateway wouldn't connect over HTTPS, problem solved by changing the
cert). 99% of the devices connecting to this endpoint are Android and
iOS devices, and given the fragmentation that Android is suffering of
this wouldn't suprise me.

Thanks everyone!

Best,

Andrei
Lukas Tribus <mailto:[email protected]>
July 8, 2013 11:46 AM
Hi Andrei,

I only see a single session of that IP in the cap file.

What we can see from the dump is:
- the client provides both a TLS session ticket and a session ID
- the server acknowledges the session ID
- the server sends a "Change Cipher Spec" message [1]
- the client disconnects

I don't think this is enough information to draw a conclusion. A wild
guess could be that the client gets upset about the Change Cipher Spec
message, but that is really a very wild guess.

We would need to see the session before and after this one, to be able to
put them in context. Any additional informations about the User-Agent
would
certainly also help.

Btw, can you clearly reproduce this, or is this a random session failed on
your prodution box?

Regards,

Lukas

[1]
http://de.wikipedia.org/wiki/Transport_Layer_Security#TLS_Change_Cipher_Spec_Protocol

Willy Tarreau <mailto:[email protected]>
July 8, 2013 9:40 AM
Hello Andrei,

That would definitely help, in order to pass it via ssldump. Or you can
do it yourself as well. What I'm seeing anyway (-q wasn't the most helpful
option here :-)) is that the client closes first. The sequence looks like
this :

client SYN server
port 58713 -----------------------> :443
SYN/ACK
<-----------------------
ACK
----------------------->
PSH: TLSv1 client hello with SNI
----------------------->
PSH: TLSv1 server hello
<-----------------------
FIN: client decides to close
----------------------->
FIN: server acknowledges and closes
<-----------------------
RST: client had already closed
----------------------->

So in short, the client disagrees with what the server proposed. Either
it's because of the algorithms in use, or because something is missing.
For example, I'm not seeing any certificate presented by the server, so
it looks like session resumption.

Ssldump would tell us what algorithms were negociated in each direction.
You can also try with tshark/wireshark I think.

Best regards,
Willy

Andrei Marinescu <mailto:[email protected]>
July 8, 2013 9:16 AM
Hello Willy,

Thank you for your answer! I've attached a dump with two requests from
the same ip. First one failed with Connection closed during SSL
handshake, the second one failed with Timeout during SSL handshake.

I've translated the .cap file with tcpdump -qns 0 -X -r file.cap >
translated.cap in order to make the dump readable and extract the two
requests. If the original dump is needed, let me know and I'll attach
it a.s.a.p.

Willy Tarreau <mailto:[email protected]>
July 7, 2013 10:02 PM
Hello Andrei,

It's very hard to suggest anything unfortunately, since most SSL/TLS
errors
can be very cryptic. It would be nice if you could take a pcap capture of
one such faulty connection so that we can see the whole handshake and try
to find what the issue is. Many things can be involved, including
versions,
algorithms, key sizes, etc...

In order to take this capture, please use "tcpdump -s0 -npi eth0 -w
file.cap"
to ensure that packets are not truncated. If you'd prefer not to
reveal your
public IP address on the list, then please send me the capture in private.
But I must say that people here on the list tend to read SSL traces faster
than me :-)

Regards,
Willy

Andrei Marinescu <mailto:[email protected]>
July 7, 2013 6:08 PM
Hello everyone!

I've moved off AWS ELB today to HAProxy 1.5dev18. I'm doing SSL
termination at the LB and I'm encountering a rather large number of
messages such as:
- SSL Handshake failure
- Timeout during SSL handshake
- Connection closed during SSL handshake

The problem is similar to the one I've found in the archives about 2
weeks ago (http://marc.info/?l=haproxy&m=137158875803495&w=2), but
unfortunately I'm unable to debug this. I'm trying to clarify if these
are errors that are normal and I just didn't see on ELB, or if there's
anything to do to better configure HAProxy. As far as I can see in the
logs, some hosts are able to connect successfully sometimes, and with
errors other times. Hosts that have errors tend to have more errors
than successful requests. Also, almost of the devices accessing this
service are Android and iOS devices.

I'm using a free StartSSL certificate.

I've posted the relevant haproxy.cfg lines below. Any ideas are
extremly welcome!

defaults
    option accept-invalid-http-request
    option httplog
    log global
    mode http
    option http-server-close
    option redispatch
    timeout connect 60000ms
    timeout client 60000ms
    timeout server 60000ms
frontend www_secure
    mode http
    bind 0.0.0.0:443 ssl crt CERTNAME1.pem crt CERTNAME2.pem
    (acl's directing traffic to 2 backends)

Hi andrei,

I suspect the issue is linked to the ECDHE cipher used (0xc014).

Could you do some test excluding ECDHE ciphers from available suite.

Re-check if error occured adding
cipher AES:RC4:ALL:!aNULL:!eNULL:!LOW:!EXPORT:!SSLv2:!ECDH
On the bind line.

Regards,
Emeric

Andrei Marinescu

July 8, 2013 12:06 PM

Hi Lukas,

Unfortunately I'm not able to reproduce this on any of the devices I have access to, I'm just seeing these erros in the logs and I'm trying to track down the issue. I guess I'll try to find an easy to reproduce scenario and return with a cap file at that time.

Just so that I can delete one possibility from my list, is it possible that some devices reject the certificate I'm using? I'm thinking of this because I ran into an issue with this CA on another server (a payment gateway wouldn't connect over HTTPS, problem solved by changing the cert). 99% of the devices connecting to this endpoint are Android and iOS devices, and given the fragmentation that Android is suffering of this wouldn't suprise me.

Thanks everyone!

Best,

Andrei

Lukas Tribus

July 8, 2013 11:46 AM

Hi Andrei,

I only see a single session of that IP in the cap file.

What we can see from the dump is:
- the client provides both a TLS session ticket and a session ID
- the server acknowledges the session ID
- the server sends a "Change Cipher Spec" message [1]
- the client disconnects

I don't think this is enough information to draw a conclusion. A wild
guess could be that the client gets upset about the Change Cipher Spec
message, but that is really a very wild guess.

We would need to see the session before and after this one, to be able to
put them in context. Any additional informations about the User-Agent would
certainly also help.

Btw, can you clearly reproduce this, or is this a random session failed on
your prodution box?

Regards,

Lukas

[1] http://de.wikipedia.org/wiki/Transport_Layer_Security#TLS_Change_Cipher_Spec_Protocol

Willy Tarreau

July 8, 2013 9:40 AM

Hello Andrei,

That would definitely help, in order to pass it via ssldump. Or you can
do it yourself as well. What I'm seeing anyway (-q wasn't the most helpful
option here :-)) is that the client closes first. The sequence looks like
this :

client SYN server
port 58713 -----------------------> :443
SYN/ACK
<-----------------------
ACK
----------------------->
PSH: TLSv1 client hello with SNI
----------------------->
PSH: TLSv1 server hello
<-----------------------
FIN: client decides to close
----------------------->
FIN: server acknowledges and closes
<-----------------------
RST: client had already closed
----------------------->

So in short, the client disagrees with what the server proposed. Either
it's because of the algorithms in use, or because something is missing.
For example, I'm not seeing any certificate presented by the server, so
it looks like session resumption.

Ssldump would tell us what algorithms were negociated in each direction.
You can also try with tshark/wireshark I think.

Best regards,
Willy

Andrei Marinescu

July 8, 2013 9:16 AM

Hello Willy,

Thank you for your answer! I've attached a dump with two requests from the same ip. First one failed with Connection closed during SSL handshake, the second one failed with Timeout during SSL handshake.

I've translated the .cap file with tcpdump -qns 0 -X -r file.cap > translated.cap in order to make the dump readable and extract the two requests. If the original dump is needed, let me know and I'll attach it a.s.a.p.

Re: SSL Handshake errors

Reply via email to