Hi Willy,

Thanks for the nice, detailed feedback.
Overall, I agree with all of your listed points, so no need for further 
discussions. 😊
I will hopefully send the separated patches at the beginning of next week.

Just a comment and two small questions to make sure I got things correct:

> As such I'd like them to be renamed to remove this confusion.
> Maybe just call them HA_PP2_* ?

Yes, such a prefix will clean it up quite nicely IMO. Will add that to the 
first patch of the series.

> [...]
> It may even encourage us to create
> smaller pools if ever deemed really useful (e.g. a 4- or 8- byte
> load balancing hash key for end-to-end consistency would only
> take a total of 32 or 40, malloc included).

Just to make sure: Right now, we don't want to optimize further for small TLVs 
other than removing the second pool for the values and using the new struct 
with the VLA to reduce the overhead.
For normal use, a pool with 160 B could  be used now to accommodate for the new 
conn_tlv_list struct and 128 B TLVs (e.g., UNIQUE_ID)?
For the authority type, it would then be a 255 + 32 B sized pool? Maybe that 
could be used for the value range 128 <= x <= 255, and then, for > 255, malloc?
Other, smaller pools are future work?

>    struct conn_tlv_list {
>       struct list list;
>        unsigned short len; // 65535 should be more than enough!
>        unsigned char type;
>        char contents[0];
 >   };

I am also a bit confused about the second struct variant of this. IMO this is 
the optimal struct layout in regards to size, that I would like to use.
What is the other struct for, where `len` and `type` are switched?

Thanks again for your efforts, it's really interesting for me to work on 
HAProxy.

Best,
Alexander
-----Original Message-----
From: Willy Tarreau <w...@1wt.eu> 
Sent: Thursday, August 10, 2023 9:18 AM
To: Stephan, Alexander <alexander.step...@sap.com>
Cc: haproxy@formilux.org
Subject: Re: [PATCH] MEDIUM: sample: Implement sample fetch for arbitrary PROXY 
protocol v2 TLV values

[You don't often get email from w...@1wt.eu. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Alexander,

On Mon, Jul 31, 2023 at 01:11:35PM +0000, Stephan, Alexander wrote:
> Dear HAProxy-maintainers,
>
> As proposed by my colleague Christian Menges in [1], I've implemented 
> support for fetching arbitrary TLV values for PROXY protocol V2 via a sample 
> fetch.

I'm afraid I don't remember seeing this message, and it was left without 
response, so it fell through the cracks, that's not cool for your colleague.

> It can be used by calling 'fc_pp_tlv' with the numerical value of the 
> desired TLV type. This also fixes issue [2].

Indeed.

> Some design considerations for my approach:
> o *Do not break the existing API*: Keep methods for authority and 
> unique ID, but refactor them internally with the updated, generified logic.

That's indeed preferable :-)

> o *Keep existing behavior and performance*: Pools for authority and 
> unique ID are still used. Already implemented parsing logic is still 
> applied. All information about a TLV payload that is already should be 
> used for validation.

OK.

> o *Small memory footprint*: As there might be up to 50k connections, 
> an array or hash map is too memory intensive. Therefore, a simple list 
> is used. It is the best choice here in my opinion, as there are 
> typically only a handful of TLVs.

I agree. At first I thought "Hmmm lists?" but we're speaking about very few 
items here, less than a handfull in general, so yes, I agree that the list 
might probably be the cheapest option.

> Additionally, memory pooling is used where possible. When TLV values 
> become too large there is a fallback to 'malloc'.

At first glance I like this approach. We can see in field how it deals with the 
traffic, but it should be OK. However I'm noticing that you put a limit to 127, 
which is a bit unfortunate, because we switched the sockaddr_storage to a pool 
a few years ago, and they're of size 128, so for one extra byte you could pick 
items from that shared pool.

> Note that I choose to not overwrite existing TLVs in the list. This 
> way, we return always the first found but would allow duplicate 
> entries. This would be relevant when there are duplicate TLVs.

Good point, I hadn't thought about this. And this is desirable because our ACLs 
can iterate over multiple instances of a sample. It seems that the current 
function smp_fetch_fc_pp_tlv() doesn't support it by the way, and will only 
return the first one. Please have a look at
smp_fetch_ssl_hello_alpn() or more generally some HTTP sample fetches that set 
SMP_F_NOT_LAST. This tells the ACL engine that if the last returned entry does 
not match, it can come back with the same context to retrieve the next one, and 
so on. Once you reach the end, you just drop that flag and the ACL engine has 
its final status. In your context you would store a retry point (maybe even the 
list's pointer). There's even a
list_for_each_entry_from() to restart from a known point.

> Besides, I used 'strtoul' for the argument conversion to be consistent 
> with other fetches that already use 'strtoul'.
> If 'strtoul' is too slow for this use case, I am not opposed to 
> reverting it to the more efficient HAProxy helper.

That's not the right way to do it. When you declare a sample fetch function, 
you can also declare a function that will check its arguments.
And that function can replace these args. It's done quite a bit to turn a 
string to int after verifying a range for example. Please have a look at
sample_conv_mqtt_field_value_check() to see how that's used. This could even 
allow you to implement symbolic names as aliases for know integer values.

> *Important*: I will add the documentation after the code review, when 
> the design is finalized.

Yeah I understand :-)

Now the patch below:

> diff --git a/include/haproxy/connection-t.h 
> b/include/haproxy/connection-t.h index eb02bbcf2..a7ebed5ec 100644
> --- a/include/haproxy/connection-t.h
> +++ b/include/haproxy/connection-t.h
> @@ -501,6 +501,17 @@ struct conn_hash_params {
>       struct sockaddr_storage *dst_addr;  };
>
> +/*
> + * This structure describes an TLV entry consisting of its type
> + * and corresponding payload. This can be used to construct a list
> + * from which arbitrary TLV payloads can be fetched.
> + */
> +struct conn_tlv_list {
> +    struct list list;
> +    struct ist value;
> +    unsigned char type;
> +};
> +
>  /* This structure describes a connection with its methods and data.
>   * A connection may be performed to proxy or server via a local or remote
>   * socket, and can also be made to an internal applet. It can support 
> @@ -534,10 +545,9 @@ struct connection {
>
>       /* third cache line and beyond */
>       void (*destroy_cb)(struct connection *conn);  /* callback to notify of 
> imminent death of the connection */
> -     struct sockaddr_storage *src; /* source address (pool), when known, 
> otherwise NULL */
> -     struct sockaddr_storage *dst; /* destination address (pool), when 
> known, otherwise NULL */
> -     struct ist proxy_authority;   /* Value of the authority TLV received 
> via PROXYv2 */
> -     struct ist proxy_unique_id;   /* Value of the unique ID TLV received 
> via PROXYv2 */
> +     struct sockaddr_storage *src;  /* source address (pool), when known, 
> otherwise NULL */
> +     struct sockaddr_storage *dst;  /* destination address (pool), when 
> known, otherwise NULL */
> +     struct conn_tlv_list tlv_list; /* list of TLVs received via 
> + PROXYv2 */

This one should not be of type conn_tlv_list but of type list, because it's the 
list's head, and its members value and type are never used so they needlessly 
waste space.

>       /* used to identify a backend connection for http-reuse,
>        * thus only present if conn.target is of type OBJ_TYPE_SERVER 
> @@ -615,10 +625,22 @@ struct mux_proto_list {
>  #define PP2_CLIENT_CERT_CONN     0x02
>  #define PP2_CLIENT_CERT_SESS     0x04
>
> +#define TLV_HEADER_SIZE 3
> +
> +/* Max length of maximum prefix plus length of maximum hex length of 
> +ID (i.e., 0x02) */ #define TLV_TYPE_STR_MAX 4
> +
>  /* Max length of the authority TLV */  #define PP2_AUTHORITY_MAX 255
>
> -#define TLV_HEADER_SIZE      3
> +/* Max length of a generic TLV */
> +#define PP2_GENERIC_MAX_POOLED 127
> +
> +/* Max length of an allocated TLV to prevent potential DoS attacks */ 
> +#define PP2_GENERIC_MAX_ALLOC 1024
> +
> +/* Max length of a CRC32C payload */
> +#define PP2_CRC32C_LEN 4

Here I'm seeing some confusion appearing (and you're not the one who introduced 
it). The fast majority of the PP2_* macros were proxy-protocol definitions. And 
now we're seeing new macros named following the same syntax but which define 
haproxy limitations for use with the PP2. That's quite confusing for those who 
try to figure what is a protocol limit and what is an haproxy limit. I hadn't 
noticed it before and thought that PP2_AUTHORITY_MAX was a protocol limit but 
apparently not it's purely haproxy. As such I'd like them to be renamed to 
remove this confusion.
Maybe just call them HA_PP2_* ?

>  struct proxy_hdr_v2 {
>       uint8_t sig[12];   /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
> diff --git a/include/haproxy/connection.h 
> b/include/haproxy/connection.h index 8cf22ef4f..c1affeda7 100644
> --- a/include/haproxy/connection.h
> +++ b/include/haproxy/connection.h
> @@ -42,6 +42,8 @@ extern struct pool_head *pool_head_conn_hash_node;  
> extern struct pool_head *pool_head_sockaddr;  extern struct pool_head 
> *pool_head_authority;  extern struct pool_head *pool_head_uniqueid;
> +extern struct pool_head *pool_head_pp_tlv_value; extern struct 
> +pool_head *pool_head_pp_tlv_list;

I hadn't realized that you need two distinct pools, one for the list elements 
and one for the objects. And I understand why, it's in order to permit the 
allocation of larger objects using malloc. But it also means it consumes more 
memory for the small cases (the pointer to the object needs to be stored, and 
the size is a size_t). Storing just a 4-byte TLV would use (without its own 
pool):

  - 1 struct conn_tlv_list (33 bytes, padded to 40), + 8 for malloc
    bookkeeping ;

  - 1 struct tlv_value of PP2_GENERIC_MAX_POOLED (127 bytes, padded
    to 128) + 8 for malloc bookkeeping

=> we're already at 184 to store any variable. And larger ones will
   take 48+8+their size via malloc().

If instead we'd say that we have some conn_tlv_list like this:

    struct conn_tlv_list {
        struct list list;
        unsigned short len; // 65535 should be more than enough!
        unsigned char type;
        char contents[0];
    };

Then we'd allocate 16+2+1 + type_size + padding + 8, ~32 + size, and everything 
has a direct access. For your 128-sized pools it's only 160, and no more 
double-dereference. It allows us to avoid inflating small entries to 128 (BTW I 
just noticed that UNIQUE_ID is 128), and that could be a good mix:

    struct conn_tlv_list {
        struct list list;
        unsigned char type;
        unsigned short len; // 65535 should be more than enough!
        char contents[0];
    };

These nodes are allocated from the various pools or using malloc() depending on 
the size. The type and length are in them so you know where to free them (i.e. 
known types to their pools, unknown types to pool_free() or free() depending on 
the len). And it's fast and the data has instant access. It may even encourage 
us to create smaller pools if ever deemed really useful (e.g. a 4- or 8- byte 
load balancing hash key for end-to-end consistency would only take a total of 
32 or 40, malloc included).

> diff --git a/reg-tests/sample_fetches/tlvs.vtc 
> b/reg-tests/sample_fetches/tlvs.vtc
> new file mode 100644
> index 000000000..9312b1df9
> --- /dev/null
> +++ b/reg-tests/sample_fetches/tlvs.vtc
> @@ -0,0 +1,57 @@
> +varnishtest "Tests for fetching PROXY protocol v2 TLVs"
> +feature ignore_unknown_macro

Thanks for the vtest, much appreciated!

> diff --git a/src/connection.c b/src/connection.c index 
> 9c709034a..c60a7d14b 100644
> --- a/src/connection.c
> +++ b/src/connection.c
> @@ -46,12 +49,27 @@ struct xprt_ops *registered_xprt[XPRT_ENTRIES] = { 
> NULL, };  struct mux_proto_list mux_proto_list = {
>          .list = LIST_HEAD_INIT(mux_proto_list.list)
>  };
> -
>  struct mux_stopping_data mux_stopping_data[MAX_THREADS];

Above an out-of-context line break was removed. Nothing much to say about the 
rest, except that it changes many things at once.

So now I'm seeing that the patch covers several areas:
  - cleanups of the existing code without providing a new feature
    (e.g. use PP2_CRC32_LEN instead of 4)

  - some new infrastructure to allocate/free/lookup entries on a
    connection

  - some generic ways to expose these captures via samples (and ideally
    in a way that is compatible with the ACL's SMP_F_NOT_LAST

  - the migration of unique_id and authority to your new infrastructure

I'd like to get them into separate patches that will ease fixes, bisect and 
backports later if needed. The first one (cleanup) would also include the PP2_ 
-> HA_PP2_ renaming I mentioned above to avoid further confusion.
The second one would bring your new code for alloc/free/lookup. I'm seeing that 
you reused a part of smp_fetch_fc_pp_authority() anyway so feel free to 
duplicate it and only keep what you need from it. You'd get a transition where 
authority/unique_id would either not be extracted by this or be duplicated and 
appear in both, that's not a problem because it's just a transition that eases 
reviews and bisects, so just do what's simpler and still works. Then you can 
implement your sample fetch function. Then you can migrate authority, then 
unique_id (each in their own patch). You may likely have to install an args 
check function for each of them in order to remap them to some specific args of 
your sample fetch function, and you're done.

Overall it's already very close to being mergeable and only requires little 
adaptation IMHO. If you'd face some particular difficulty splitting certain 
steps above, please discuss them. I'm not stubborn (well I am but not that 
much). As long as things are reasonably possible I'm fine, but I'd like as much 
as possible to see the split above so that we separate the new stuff from the 
migration of the old one.

Thanks!
Willy

Reply via email to