Hi Willy,

On Mon, Jul 12, 2021 at 9:09 PM Willy Tarreau <[email protected]> wrote:

>
> Just out of curiosity (feel free not to respond if you'd prefer not to),
> how are you using this result ? Is it to try to figure outliers by
> matching signatures against what the user-agent claims to be, or just
> for monitoring/logging or maybe for rate limiting bots ? Did you detect
> different SSL libraries for a same user-agent ?
>

TLS Fingerprinting is just another technology to give you more accurate
information regarding what clients/user agents use your service. There is a
multitude of use cases for TLS Fingerprinting:
- More accurate identification of user agents
- Blocking of malware and similar
- Logging/analytics puposes
I don't think we use this yet, but our Security department is interested in
it, so presumably we want a better protection from malware and similar.


> Indeed, it makes sense to offer the option to exclude purposely added noise
> from the computation. I don't know what JA3 specifies regarding this, but I
> guess it excludes it.
>

Yes, it does otherwise you would end up with additional fingerprints
pointing to the very same user agent.


> From what I'm seeing there, you could probably simplify the function and
> consider that you always allocate and copy if you need to exclude grease.
> A client hello is not huge anyway, and the time saved in the memcpy() of
> a few hundred bytes is not much compared to the overall processing of an
> SSL hello. By the way, it would be nice if you could use a different name
> (e.g. "temp") for your local trash pointer, as it shadows the thread-local
> "trash" and can be confusing.
>

Without going into details I simplified the function to always copy the
data. I renamed local trash pointer to 'output' and ssl_capture data
pointer to 'input' to better reflect what goes in and out.


> All this would indeed make a lot of sense. However, just renaming a config
> setting is not an option. What can be done is to create the new one and
> continue to process the old one while emitting a deprecation warning asking
> to use the other one instead.Maybe the size will differ and the doc will
> need to explain how to transform the values.
>

Let's see if I can manage to do that.


> I'm having an issue with your new definition of ssl_capture_location
> and its use in ssl_capture:
>
>  /* Location and size of the data in the buffer */
>  struct ssl_capture_location {
>         unsigned char len;       // offset 0, size 1 byte, followed by a
> 7-byte hole
>         size_t offset;           // offset 8, size 8 bytes
>  };
>
> => this structure takes 16 bytes of memory
>
>  /* This memory pool is used for capturing clienthello parameters. */
>  struct ssl_capture {
>         unsigned long long int xxh64;
>         unsigned int protocol_version;
>         struct ssl_capture_location ciphersuite;
>         struct ssl_capture_location extensions;
>         struct ssl_capture_location ec;
>         struct ssl_capture_location ec_formats;
>         char data[VAR_ARRAY];
>  };
>
> => thus above just for the lengths we're using 64 bytes of memory, this
>    starts to be quite a lot per capture. Given that no TLS record can be
>    larger than 16kB (or is that 64?), you could use two unsigned shorts
>    and divide this overhead by 4.


It looks like although I came across this thread:
https://mta.openssl.org/pipermail/openssl-dev/2015-September/002845.html
Which seems to suggest that theoretically we might end up with X * ~16kB
which would suggest to use int instead of short int. I try to play safe
here :-)


> > - Instead of creating a new converter I decided to extend existing hex
> > conveter to provide a similar functionality to bin2int. I thought this
> > makes more sense as extended hex converter is fully backward compatible.
> It
> > has to be noted that extended hex converter is not strictly necessary to
> > produce JA3 TLS Fingerprint, but but might useful in some other
> scenarios.
>
> Actually I've already missed this ability to decode larger ints so it's
> welcome. But there's an important point that your change doesn't take into
> account (for both bin2int and hex), which is the input byte ordering. At
> the moment only big endian is supported. In addition, the "bin2int" makes
> me think it emits an integer while it emits an ASCII decimal representation
> of it.
>
> What I could suggest instead would be to add the following converters:
>   be2dec()  // big endian to decimal
>   be2hex()  // big endian to hexadecimal
>   le2dec()  // little endian to decimal
>   le2hex()  // little endian to hexadecimal
>
> We could later complete these with other less useful variants like octal
> or raw ints (e.g. to extract dates). Just like we could imagine supporting
> some flavors of varints on input later if needed for some protocols.
>

Will add new converters and name them be2hex / be2dec (hex will stay
intact).


> > Example usage:
> > http-request set-header X-SSL-JA3
> >
> %[ssl_fc_protocol_hello_id],%[ssl_fc_cipherlist_bin(1),bin2int(-,2)],%[ssl_fc_extlist_bin(1),bin2int(-,2)],%[ssl_fc_eclist_bin(1),bin2int(-,2)],%[ssl_fc_ecformats_bin,bin2int(-,1)]
> > http-request set-header X-SSL-JA3-Hash
> > %[req.fhdr(x-ssl-ja3),digest(md5),hex]
>
> I think in the doc you should add an example showing how to match a
> signature against those listed in file "lists/osx-nix-ja3.csv" in the
> project. It will help you verify if the solution completely works and
> is practically usable. Maybe it can involve intermediary variables
> for example.
>

Thought about it although due to complexity decided not to include it.
Potentially I could "spread it" over multiple lines to make it more
readable.
Regards,

Marcin Deranek

Reply via email to