Hi Alexander,

On Mon, Jul 31, 2023 at 01:11:35PM +0000, Stephan, Alexander wrote:
> Dear HAProxy-maintainers,
> 
> As proposed by my colleague Christian Menges in [1], I've implemented support
> for fetching arbitrary TLV values for PROXY protocol V2 via a sample fetch.

I'm afraid I don't remember seeing this message, and it was left without
response, so it fell through the cracks, that's not cool for your colleague.

> It can be used by calling 'fc_pp_tlv' with the numerical value of the desired
> TLV type. This also fixes issue [2].

Indeed.

> Some design considerations for my approach:
> o *Do not break the existing API*: Keep methods for authority and unique ID,
> but refactor them internally with the updated, generified logic.

That's indeed preferable :-)

> o *Keep existing behavior and performance*: Pools for authority and unique ID
> are still used. Already implemented parsing logic is still applied. All
> information about a TLV payload that is already should be used for
> validation.

OK.

> o *Small memory footprint*: As there might be up to 50k connections, an array
> or hash map is too memory intensive. Therefore, a simple list is used. It is
> the best choice here in my opinion, as there are typically only a handful of
> TLVs.

I agree. At first I thought "Hmmm lists?" but we're speaking about very
few items here, less than a handfull in general, so yes, I agree that
the list might probably be the cheapest option.

> Additionally, memory pooling is used where possible. When TLV values
> become too large there is a fallback to 'malloc'.

At first glance I like this approach. We can see in field how it deals
with the traffic, but it should be OK. However I'm noticing that you put
a limit to 127, which is a bit unfortunate, because we switched the
sockaddr_storage to a pool a few years ago, and they're of size 128, so
for one extra byte you could pick items from that shared pool.

> Note that I choose to not overwrite existing TLVs in the list. This way, we
> return always the first found but would allow duplicate entries. This would
> be relevant when there are duplicate TLVs.

Good point, I hadn't thought about this. And this is desirable because
our ACLs can iterate over multiple instances of a sample. It seems that
the current function smp_fetch_fc_pp_tlv() doesn't support it by the way,
and will only return the first one. Please have a look at
smp_fetch_ssl_hello_alpn() or more generally some HTTP sample fetches that
set SMP_F_NOT_LAST. This tells the ACL engine that if the last returned
entry does not match, it can come back with the same context to retrieve
the next one, and so on. Once you reach the end, you just drop that flag
and the ACL engine has its final status. In your context you would store
a retry point (maybe even the list's pointer). There's even a
list_for_each_entry_from() to restart from a known point.

> Besides, I used 'strtoul' for the argument conversion to be consistent with
> other fetches that already use 'strtoul'.
> If 'strtoul' is too slow for this use case, I am not opposed to reverting it
> to the more efficient HAProxy helper.

That's not the right way to do it. When you declare a sample fetch
function, you can also declare a function that will check its arguments.
And that function can replace these args. It's done quite a bit to turn a
string to int after verifying a range for example. Please have a look at
sample_conv_mqtt_field_value_check() to see how that's used. This could
even allow you to implement symbolic names as aliases for know integer
values.

> *Important*: I will add the documentation after the code review, when the
> design is finalized.

Yeah I understand :-)

Now the patch below:

> diff --git a/include/haproxy/connection-t.h b/include/haproxy/connection-t.h
> index eb02bbcf2..a7ebed5ec 100644
> --- a/include/haproxy/connection-t.h
> +++ b/include/haproxy/connection-t.h
> @@ -501,6 +501,17 @@ struct conn_hash_params {
>       struct sockaddr_storage *dst_addr;
>  };
>  
> +/*
> + * This structure describes an TLV entry consisting of its type
> + * and corresponding payload. This can be used to construct a list
> + * from which arbitrary TLV payloads can be fetched.
> + */
> +struct conn_tlv_list {
> +    struct list list;
> +    struct ist value;
> +    unsigned char type;
> +};
> +
>  /* This structure describes a connection with its methods and data.
>   * A connection may be performed to proxy or server via a local or remote
>   * socket, and can also be made to an internal applet. It can support
> @@ -534,10 +545,9 @@ struct connection {
>  
>       /* third cache line and beyond */
>       void (*destroy_cb)(struct connection *conn);  /* callback to notify of 
> imminent death of the connection */
> -     struct sockaddr_storage *src; /* source address (pool), when known, 
> otherwise NULL */
> -     struct sockaddr_storage *dst; /* destination address (pool), when 
> known, otherwise NULL */
> -     struct ist proxy_authority;   /* Value of the authority TLV received 
> via PROXYv2 */
> -     struct ist proxy_unique_id;   /* Value of the unique ID TLV received 
> via PROXYv2 */
> +     struct sockaddr_storage *src;  /* source address (pool), when known, 
> otherwise NULL */
> +     struct sockaddr_storage *dst;  /* destination address (pool), when 
> known, otherwise NULL */
> +     struct conn_tlv_list tlv_list; /* list of TLVs received via PROXYv2 */

This one should not be of type conn_tlv_list but of type list, because
it's the list's head, and its members value and type are never used so
they needlessly waste space.

>       /* used to identify a backend connection for http-reuse,
>        * thus only present if conn.target is of type OBJ_TYPE_SERVER
> @@ -615,10 +625,22 @@ struct mux_proto_list {
>  #define PP2_CLIENT_CERT_CONN     0x02
>  #define PP2_CLIENT_CERT_SESS     0x04
>  
> +#define TLV_HEADER_SIZE 3
> +
> +/* Max length of maximum prefix plus length of maximum hex length of ID 
> (i.e., 0x02) */
> +#define TLV_TYPE_STR_MAX 4
> +
>  /* Max length of the authority TLV */
>  #define PP2_AUTHORITY_MAX 255
>  
> -#define TLV_HEADER_SIZE      3
> +/* Max length of a generic TLV */
> +#define PP2_GENERIC_MAX_POOLED 127
> +
> +/* Max length of an allocated TLV to prevent potential DoS attacks */
> +#define PP2_GENERIC_MAX_ALLOC 1024
> +
> +/* Max length of a CRC32C payload */
> +#define PP2_CRC32C_LEN 4

Here I'm seeing some confusion appearing (and you're not the one who
introduced it). The fast majority of the PP2_* macros were proxy-protocol
definitions. And now we're seeing new macros named following the same
syntax but which define haproxy limitations for use with the PP2. That's
quite confusing for those who try to figure what is a protocol limit and
what is an haproxy limit. I hadn't noticed it before and thought that
PP2_AUTHORITY_MAX was a protocol limit but apparently not it's purely
haproxy. As such I'd like them to be renamed to remove this confusion.
Maybe just call them HA_PP2_* ? 

>  struct proxy_hdr_v2 {
>       uint8_t sig[12];   /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
> diff --git a/include/haproxy/connection.h b/include/haproxy/connection.h
> index 8cf22ef4f..c1affeda7 100644
> --- a/include/haproxy/connection.h
> +++ b/include/haproxy/connection.h
> @@ -42,6 +42,8 @@ extern struct pool_head *pool_head_conn_hash_node;
>  extern struct pool_head *pool_head_sockaddr;
>  extern struct pool_head *pool_head_authority;
>  extern struct pool_head *pool_head_uniqueid;
> +extern struct pool_head *pool_head_pp_tlv_value;
> +extern struct pool_head *pool_head_pp_tlv_list;

I hadn't realized that you need two distinct pools, one for the list
elements and one for the objects. And I understand why, it's in order
to permit the allocation of larger objects using malloc. But it also
means it consumes more memory for the small cases (the pointer to the
object needs to be stored, and the size is a size_t). Storing just a
4-byte TLV would use (without its own pool):

  - 1 struct conn_tlv_list (33 bytes, padded to 40), + 8 for malloc
    bookkeeping ;

  - 1 struct tlv_value of PP2_GENERIC_MAX_POOLED (127 bytes, padded
    to 128) + 8 for malloc bookkeeping

=> we're already at 184 to store any variable. And larger ones will
   take 48+8+their size via malloc().

If instead we'd say that we have some conn_tlv_list like this:

    struct conn_tlv_list {
        struct list list;
        unsigned short len; // 65535 should be more than enough!
        unsigned char type;
        char contents[0];
    };

Then we'd allocate 16+2+1 + type_size + padding + 8, ~32 + size,
and everything has a direct access. For your 128-sized pools it's
only 160, and no more double-dereference. It allows us to avoid
inflating small entries to 128 (BTW I just noticed that UNIQUE_ID
is 128), and that could be a good mix:

    struct conn_tlv_list {
        struct list list;
        unsigned char type;
        unsigned short len; // 65535 should be more than enough!
        char contents[0];
    };

These nodes are allocated from the various pools or using malloc()
depending on the size. The type and length are in them so you know
where to free them (i.e. known types to their pools, unknown types
to pool_free() or free() depending on the len). And it's fast and
the data has instant access. It may even encourage us to create
smaller pools if ever deemed really useful (e.g. a 4- or 8- byte
load balancing hash key for end-to-end consistency would only
take a total of 32 or 40, malloc included).

> diff --git a/reg-tests/sample_fetches/tlvs.vtc 
> b/reg-tests/sample_fetches/tlvs.vtc
> new file mode 100644
> index 000000000..9312b1df9
> --- /dev/null
> +++ b/reg-tests/sample_fetches/tlvs.vtc
> @@ -0,0 +1,57 @@
> +varnishtest "Tests for fetching PROXY protocol v2 TLVs"
> +feature ignore_unknown_macro

Thanks for the vtest, much appreciated!

> diff --git a/src/connection.c b/src/connection.c
> index 9c709034a..c60a7d14b 100644
> --- a/src/connection.c
> +++ b/src/connection.c
> @@ -46,12 +49,27 @@ struct xprt_ops *registered_xprt[XPRT_ENTRIES] = { NULL, 
> };
>  struct mux_proto_list mux_proto_list = {
>          .list = LIST_HEAD_INIT(mux_proto_list.list)
>  };
> -
>  struct mux_stopping_data mux_stopping_data[MAX_THREADS];

Above an out-of-context line break was removed. Nothing much to say
about the rest, except that it changes many things at once.

So now I'm seeing that the patch covers several areas:
  - cleanups of the existing code without providing a new feature
    (e.g. use PP2_CRC32_LEN instead of 4)

  - some new infrastructure to allocate/free/lookup entries on a
    connection

  - some generic ways to expose these captures via samples (and ideally
    in a way that is compatible with the ACL's SMP_F_NOT_LAST

  - the migration of unique_id and authority to your new infrastructure

I'd like to get them into separate patches that will ease fixes, bisect
and backports later if needed. The first one (cleanup) would also include
the PP2_ -> HA_PP2_ renaming I mentioned above to avoid further confusion.
The second one would bring your new code for alloc/free/lookup. I'm seeing
that you reused a part of smp_fetch_fc_pp_authority() anyway so feel free
to duplicate it and only keep what you need from it. You'd get a transition
where authority/unique_id would either not be extracted by this or be
duplicated and appear in both, that's not a problem because it's just a
transition that eases reviews and bisects, so just do what's simpler and
still works. Then you can implement your sample fetch function. Then you
can migrate authority, then unique_id (each in their own patch). You may
likely have to install an args check function for each of them in order
to remap them to some specific args of your sample fetch function, and
you're done.

Overall it's already very close to being mergeable and only requires little
adaptation IMHO. If you'd face some particular difficulty splitting certain
steps above, please discuss them. I'm not stubborn (well I am but not that
much). As long as things are reasonably possible I'm fine, but I'd like as
much as possible to see the split above so that we separate the new stuff
from the migration of the old one.

Thanks!
Willy

Reply via email to