Re: [openssl-users] How to handle DTLS Certificate Reassembly Error

2016-09-18 Thread Chad Phillips
Great, thanks for this very clear description, I passed it along to the
Licode developers, and hopefully we can put this sucker to rest.

I also included your recommendation to upgrade, which is something I’ve
been bugging them to do for awhile :)

On Sun, Sep 18, 2016 at 1:37 AM, Matt Caswell  wrote:

>
>
> On 18/09/16 01:01, Chad Phillips wrote:
> > On Sat, Sep 17, 2016 at 3:43 PM, Matt Caswell  > > wrote:
> >
> > There is an OpenSSL API which is intended to resolve this issue:
> >
> > DTLSv1_handle_timeout()
> >
> > The application is expected to call this periodically during the
> > handshake if no other data has been sent or received. The causes
> > OpenSSL to check its timer and do any retransmits if necessary. If
> > licode doesn’t call this, then its plausible that this is the cause
> > of the issue.
> >
> >
> > “grep -r DTLSv1_handle_timeout .” in the Licode source directory returns
> > nothing, so we may have our culprit!
> >
> > Curious what versions of openssl support the DTLSv1_handle_timeout()
> > approach? I know the Licode guys run 1.0.1g, it would be great if a
> > single solution could be committed that was backwards compatible.
>
> Yes, DTLSv1_handle_timeout() is available in 1.0.1 as well. BTW there
> have been many DTLS bug and security fixes since 1.0.1g which is now
> quite old. The 1.0.1 series is now only receiving security fixes, and
> will go out of support completely at the end of the year. It is strongly
> recommended that they upgrade to a more recent version.
>
>
> >
> > Is there anything special I should know about how to use
> > DTLSv1_handle_timeout()? Just have it run on a timer until the handshake
> > completes? I guess I’m asking for some pre-documentation ;)
>
> Well the best way to use it is going to depend a lot on how the
> application is written. The API is fairly simple - just call
> DTLSv1_handle_timeout() periodically passing in the pointer to the SSL
> object. In our own s_server/s_client we just call it every time we go
> around the "select" loop on the socket. We ensure that the "select" call
> doesn't block indefinitely, but instead times out after the DTLS timer
> period has expired. We then call DTLSv1_handle_timeout() regardless of
> whether "select" has returned because the socket is readable, or because
> it has timed out. A (slightly modified and simplified) version of what
> we do in s_server is below:
>
> FD_ZERO();
> FD_SET(s, );
>
> if (DTLSv1_get_timeout(con, ))
> timeoutp = 
> else
> timeoutp = NULL;
>
> i = select(width, (void *), NULL, NULL, timeoutp);
>
> if (DTLSv1_handle_timeout(con) > 0) {
> BIO_printf(bio_err, "TIMEOUT occurred\n");
> }
>
> if (i <= 0)
> continue;
>
> if (FD_ISSET(s, ))
> read_from_sslcon = 1;
>
> Matt
> --
> openssl-users mailing list
> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users
>
-- 
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users


Re: [openssl-users] How to handle DTLS Certificate Reassembly Error

2016-09-18 Thread Matt Caswell


On 18/09/16 01:01, Chad Phillips wrote:
> On Sat, Sep 17, 2016 at 3:43 PM, Matt Caswell  > wrote:
> 
> There is an OpenSSL API which is intended to resolve this issue:
> 
> DTLSv1_handle_timeout()
> 
> The application is expected to call this periodically during the
> handshake if no other data has been sent or received. The causes
> OpenSSL to check its timer and do any retransmits if necessary. If
> licode doesn’t call this, then its plausible that this is the cause
> of the issue.
> 
> 
> “grep -r DTLSv1_handle_timeout .” in the Licode source directory returns
> nothing, so we may have our culprit!
> 
> Curious what versions of openssl support the DTLSv1_handle_timeout()
> approach? I know the Licode guys run 1.0.1g, it would be great if a
> single solution could be committed that was backwards compatible.

Yes, DTLSv1_handle_timeout() is available in 1.0.1 as well. BTW there
have been many DTLS bug and security fixes since 1.0.1g which is now
quite old. The 1.0.1 series is now only receiving security fixes, and
will go out of support completely at the end of the year. It is strongly
recommended that they upgrade to a more recent version.


> 
> Is there anything special I should know about how to use
> DTLSv1_handle_timeout()? Just have it run on a timer until the handshake
> completes? I guess I’m asking for some pre-documentation ;)

Well the best way to use it is going to depend a lot on how the
application is written. The API is fairly simple - just call
DTLSv1_handle_timeout() periodically passing in the pointer to the SSL
object. In our own s_server/s_client we just call it every time we go
around the "select" loop on the socket. We ensure that the "select" call
doesn't block indefinitely, but instead times out after the DTLS timer
period has expired. We then call DTLSv1_handle_timeout() regardless of
whether "select" has returned because the socket is readable, or because
it has timed out. A (slightly modified and simplified) version of what
we do in s_server is below:

FD_ZERO();
FD_SET(s, );

if (DTLSv1_get_timeout(con, ))
timeoutp = 
else
timeoutp = NULL;

i = select(width, (void *), NULL, NULL, timeoutp);

if (DTLSv1_handle_timeout(con) > 0) {
BIO_printf(bio_err, "TIMEOUT occurred\n");
}

if (i <= 0)
continue;

if (FD_ISSET(s, ))
read_from_sslcon = 1;

Matt
-- 
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users


Re: [openssl-users] How to handle DTLS Certificate Reassembly Error

2016-09-17 Thread Chad Phillips
On Sat, Sep 17, 2016 at 3:43 PM, Matt Caswell  wrote:

There is an OpenSSL API which is intended to resolve this issue:
>
> DTLSv1_handle_timeout()
>
> The application is expected to call this periodically during the
> handshake if no other data has been sent or received. The causes
> OpenSSL to check its timer and do any retransmits if necessary. If
> licode doesn’t call this, then its plausible that this is the cause of the
> issue.


“grep -r DTLSv1_handle_timeout .” in the Licode source directory returns
nothing, so we may have our culprit!

Curious what versions of openssl support the DTLSv1_handle_timeout()
approach? I know the Licode guys run 1.0.1g, it would be great if a single
solution could be committed that was backwards compatible.

Is there anything special I should know about how to use
DTLSv1_handle_timeout()? Just have it run on a timer until the handshake
completes? I guess I’m asking for some pre-documentation ;)

Thanks again for your help, this is definitely the most clear progress I’ve
made on this problem, and it’s been haunting me for months!

Chad
-- 
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users


Re: [openssl-users] How to handle DTLS Certificate Reassembly Error

2016-09-17 Thread Matt Caswell


On 17/09/16 16:11, Chad Phillips wrote:
> Was this packet capture done on the client side or the server side or
> somewhere in the middle? There appears to be some messages missing.
> In particular I don’t see any CCS or Finished messages being
> exchanged. Is the network this is over potentially noisy that might
> explain packet loss?
> 
> 
> From the perspective of the DTLS handshake, my server hosting the Licode
> library is the client, and latest stable Chrome browser is the server,
> if I understand the terminology correctly. The packet capture was taken
> from the client (Licode) side.
> 
> Would the CCS or Finished messages have gotten filtered out by the
> ’dtls’ filter I applied to the packet capture? I do have the full trace
> and can re-filter to just one complete connection over a specific UDP
> port as you suggested, let me know if that would be helpful

I took another look at the packet trace. I found the CCS/Finished
messages! They are actually there but wireshark is not showing them for
some reason (at least my version of wireshark isn't).

On the end of the packet which contains three Certificate fragments, the
ClientKeyExchange and the Certificate Verify, my wireshark is then
saying "Malformed Packet". This in in relation to a load of data that is
in the packet after the Certificate Verify. Looking at it by eye the
packets look well formed so I'm not sure why wireshark is complaining.
Anyway after the Certificate Verify I can see the CCS, and an encrypted
handshake message - which will be the Finished.

What is odd is that we are seeing 3 Certificate fragments and 2
CertificateVerify fragments in a single network packet. OpenSSL will
only fragment if it thinks the MTU isn't big enough for anything larger.
It looks like licode is then combining the multiple fragments into a
single packet anyway. This is probably something to do with the way
licode is written meaning that OpenSSL is not getting the right MTU
value. I assume the licode developers are trying to compensate for that
by sending it all in one go anyway. That shouldn't cause any problems -
but its a bit odd and it would be better to make sure OpenSSL gets the
right MTU in the first place.

I speculate that the reason I'm seeing the "malformed packet" is that,
normally, you'd only see a maximum of 5 DTLS handshake records in a
single packet. However, because we have fragmented the Certificate and
CertificateVerify messages we've got more the 5 DTLS records in a
packet. My guess is that there is a bug in wireshark that fails if it
gets more the 5 records in one go. But that really is pure guess work.


> 
> I see these failures only in situations where browser users with slow
> and/or lossy internet are joining, and usually when the group size gets
> to be six or more participants. The particular testing scenario that
> generated the packets you saw was a user with 225kbps upload, 5120kbps
> down, 70ms delay, 0% packet loss.
> 
> I’ll grant you those network conditions aren’t the best for group video
> chat, but if Google Hangouts can pull it off, I’d like to as well.
>  
> 
> On receiving that the client should respond with a retransmit of
> the Certificate/ClientKeyExchange/CertificateVerify/CCS/Finished
> flight of messages. But it does not appear to do so…the retransmit
> does not happen until after the encrypted alert.
> 
> 
> This sounds like it might be a bug in the Licode library, not resending
> the retransmit properly?

Possibly. It could be that or it could also feasibly be a bug in
OpenSSL. However I have a theory that might explain it (but it is just a
theory).

DTLS uses a timer to retransmit messages that may have got lost. If it
hasn't had the response it expects following the last set of messages it
sent by the time the timer expires, then it retransmits them.

I wrote above that "On receiving that the client should respond with a
retransmit of the
Certificate/ClientKeyExchange/CertificateVerify/CCS/Finished flight of
messages.". Actually in reality that's a bit of an over-simplification.
What actually happens is the peer is waiting for some messages that it
doesn't receive within its timeout (either because they are lost or
delayed), and so it retransmits them. This is why we see the second set
of ServerHello (etc) messages from the server. The client application
usually then notices that the socket has become readable and the
application calls OpenSSL to process that data. OpenSSL reads it,
realises that they are retransmits of messages that it has already
processed and drops them. It also checks its own timer to see if the
client needs to retransmit any messages. If the client timer hasn't
expired yet then it does nothing. This could be why no retransmit
happens immediately after the second set of ServerHello messages, i.e.
the client timer hasn't expired yet.

Normally the server would continue to retransmit periodically, which
would cause the client application to try and 

Re: [openssl-users] How to handle DTLS Certificate Reassembly Error

2016-09-17 Thread Chad Phillips
Matt, thanks for the reply, very helpful so far! Answers to your questions
below:

You don't say what version of OpenSSL.
>

The support library I’m using is Licode:
http://lynckia.com/licode/index.html

The version of openssl I have compiled into it is 1.0.2h.


> The packet trace you sent is quite confusing, as there appears to be
> two separate handshakes going on at the same time that are interleaved.
>

Yes, apologies for that, I’ll do a better job of filtering the next time.
:) Those were two separate handshakes pulled from a packet capture of a
group video chat, filtered through Wireshark using the ‘dtls’ filter.


> It seems quite clear that this is a retransmit of the earlier message
> from client to server. Retransmits are a normal part of DTLS and are
> there to handle packet loss. If a retransmitted packet is received by
> one of the peers, and it has seen that packet before, then it is simply
> ignored. Wireshark isn't ignoring it, and is reporting it as an "error"
> simply because it has seen it before.
>

Thanks for that clarification.


> Was this packet capture done on the client side or the server side or
> somewhere in the middle? There appears to be some messages missing.
> In particular I don’t see any CCS or Finished messages being exchanged.
> Is the network this is over potentially noisy that might explain packet
> loss?
>

>From the perspective of the DTLS handshake, my server hosting the Licode
library is the client, and latest stable Chrome browser is the server, if I
understand the terminology correctly. The packet capture was taken from the
client (Licode) side.

Would the CCS or Finished messages have gotten filtered out by the ’dtls’
filter I applied to the packet capture? I do have the full trace and can
re-filter to just one complete connection over a specific UDP port as you
suggested, let me know if that would be helpful

I see these failures only in situations where browser users with slow
and/or lossy internet are joining, and usually when the group size gets to
be six or more participants. The particular testing scenario that generated
the packets you saw was a user with 225kbps upload, 5120kbps down, 70ms
delay, 0% packet loss.

I’ll grant you those network conditions aren’t the best for group video
chat, but if Google Hangouts can pull it off, I’d like to as well.


> On receiving that the client should respond with a retransmit of
> the Certificate/ClientKeyExchange/CertificateVerify/CCS/Finished
> flight of messages. But it does not appear to do so…the retransmit does
> not happen until after the encrypted alert.
>

This sounds like it might be a bug in the Licode library, not resending the
retransmit properly?


> Are both ends of the communication using OpenSSL and if so what versions?
>

>From my research, I believe Chrome uses borringssl?
-- 
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users


Re: [openssl-users] How to handle DTLS Certificate Reassembly Error

2016-09-16 Thread Matt Caswell


On 16/09/16 19:47, Chad Phillips wrote:
> I’m using a support library leveraging openssl to complete a DTLS handshake.

You don't say what version of OpenSSL.

The packet trace you sent is quite confusing, as there appears to be two
separate handshakes going on at the same time that are interleaved. They
are both essentially the same though, so I filtered on just one of them
(the one involving UDP port 30041 - either as a source or destination
port). The analysis below is just on that one handshake.

> 
> Occasionally, I’ll see in my packet captures that a handshake has failed
> with a “Certificate reassembly error”, and the support library doesn’t
> seem to be catching this properly to forward the error on.

The "Certificate reassembly error" isn't really an error at all. This
appears to simply be a packet that has been retransmitted. In wireshark
if you select that packet and open up "Datagram Transport Layer
Security" -> "Record Layer" -> "Handshake Protocol", you will see that
this is an epoch 0 message (i.e. initial handshake message), with
message sequence 1 (the second message sent from the Client - the first
one being a ClientHello with message sequence 0). This contains the
first 231 bytes of the full Certificate - which is 469 bytes in length.

If you now compare this with the very first Certificate fragment sent
from client to server you will see that it is identical.

It seems quite clear that this is a retransmit of the earlier message
from client to server. Retransmits are a normal part of DTLS and are
there to handle packet loss. If a retransmitted packet is received by
one of the peers, and it has seen that packet before, then it is simply
ignored. Wireshark isn't ignoring it, and is reporting it as an "error"
simply because it has seen it before.

> 
> The library developers are considering handling this using a timeout
> solution — triggering an error if the handshake doesn’t complete in a
> specified amount of time, but this feels a bit clunky to me. What’s the
> recommended way to get this information from openssl in this case?
> 
> For reference, I’m attaching a packet capture that illustrates two of
> the handshake failures.

Was this packet capture done on the client side or the server side or
somewhere in the middle? There appears to be some messages missing. In
particular I don't see any CCS or Finished messages being exchanged. Is
the network this is over potentially noisy that might explain packet loss?

It is particularly odd because an epoch 1 encrypted alert can be seen to
have been sent from the server to the client which suggests that a CCS
*has* been sent from the client and received by the server - but it does
not appear in the packet trace.

Another odd thing is that it can clearly be seen that the server
retransmits the
ServerHello/Certificate/ServerKeyExchange/CertRequest/ServerHelloDone
flight. On receiving that the client should respond with a retransmit of
the Certificate/ClientKeyExchange/CertificateVerify/CCS/Finished flight
of messages. But it does not appear to do so...the retransmit does not
happen until after the encrypted alert.

Anyway, this really is a failed handshake, which for some reason I'm not
seeing the retransmits I would expect to see in order to make it
succeed. There has been no visible error, both peers are just sitting
there waiting for the other one to do something. That's not going to
happen if the the retransmits aren't working properly.

Are both ends of the communication using OpenSSL and if so what versions?

Matt


-- 
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users


[openssl-users] How to handle DTLS Certificate Reassembly Error

2016-09-16 Thread Chad Phillips
I’m using a support library leveraging openssl to complete a DTLS handshake.

Occasionally, I’ll see in my packet captures that a handshake has failed
with a “Certificate reassembly error”, and the support library doesn’t seem
to be catching this properly to forward the error on.

The library developers are considering handling this using a timeout
solution — triggering an error if the handshake doesn’t complete in a
specified amount of time, but this feels a bit clunky to me. What’s the
recommended way to get this information from openssl in this case?

For reference, I’m attaching a packet capture that illustrates two of the
handshake failures.

Chad


dtls-failures.pcap
Description: Binary data
-- 
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users