Re: [PATCH] MINOR: sample: Add a regmatch converter

Willy Tarreau Tue, 13 Apr 2021 02:45:19 -0700

Hi Thayne,

On Tue, Apr 13, 2021 at 02:11:25AM -0600, Thayne McCombs wrote:
> Add a new sample converter that finds the first regex match and returns
> the substring for that match, or a capture group, if an index is
> provided.
> ---
>  doc/configuration.txt            | 22 +++++++++++
>  reg-tests/converter/regmatch.vtc | 39 +++++++++++++++++++
>  src/sample.c                     | 66 ++++++++++++++++++++++++++++++++
>  3 files changed, 127 insertions(+)
>  create mode 100644 reg-tests/converter/regmatch.vtc
> 
> diff --git a/doc/configuration.txt b/doc/configuration.txt
> index f21a29a68..e84395d23 100644
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -16238,6 +16238,28 @@ protobuf(<field_number>,[<field_type>])
>    More information may be found here about the protocol buffers message 
> field types:
>    https://developers.google.com/protocol-buffers/docs/encoding
>  
> +regmatch(<regex>[,<index>[,<flags>]])
> +  This extracts a substring that matches the regex pattern. It will return 
> the first
> +  match in the input string. By default it returns the entire match, but if 
> <index>
> +  is supplied, then the capture group for that number will be returned 
> instead. A
> +  value of 0 returns the entire match. The regex can be made case 
> insensitive by
> +  adding the flag "i" in <flags>.
> +
> +  It is highly recommended to enclose the regex part using protected quotes 
> to
> +  improve clarity and never have a closing parenthesis from the regex mixed 
> up with
> +  the parenthesis from the function. Just like in Bourne shell, the first 
> level of
> +  quotes is processed when delimiting word groups on the line, a second 
> level is
> +  usable for argument. It is recommended to use single quotes outside since 
> these
> +  ones do not try to resolve backslashes nor dollar signs.
> +
> +  Examples:
> +
> +     # extract part of content-type
> +     http-request set-var(txn.imtype) 
> 'hdr(content-type),regmatch("image/(.*)",1)'
> +
> +     # extract cookie with certain pattern
> +     http-request set-header x-test-cookie 
> %[hdr(cookie),'regmatch(test-\w+=\d+)']
> +


I'm failing to see how it differs from regsub() which already does the
same with the reference (you'd write \1 instead of 1) and also allows to
compose something out of multiple matches. Am I missing something, or a
case where regsub() is really not convenient compared to this one ?

Thanks,
Willy

Re: [PATCH] MINOR: sample: Add a regmatch converter

Reply via email to