Hi Hyeonggeun,

finally back to this review!

On Fri, Dec 26, 2025 at 03:57:34PM +0900, Hyeonggeun Oh wrote:
> This patch implements dump_all_vars([scope],[prefix]) sample fetch
> function that dumps all variables in a given scope, optionally
> filtered by name prefix.
> 
> Output format: var1=value1, var2=value2, ...
> - String values are quoted and escaped (", \, \r, \n, \b, \0)
> - All sample types are supported via sample_convert()
> - Scope can be: sess, txn, req, res, proc
> - Prefix filtering is optional
> 
> Example usage:
>   http-request return string %[dump_all_vars(txn)]
>   http-request return string %[dump_all_vars(txn,user)]
> 
> Changes based on feedback:
> - Use get_trash_chunk() instead of alloc_trash_chunk()
> - Remove chunk_reset() as it's unnecessary with get_trash_chunk()
> - Move scope parsing to smp_check_dump_all_vars() and convert string to SINT 
> at parse time (following smp_check_date_unit pattern)
> - Add delimiter argument for customizable output format
> - Use global var_scope_names[] array instead of switch statement
> - Encode binary values as hex with 'x' prefix
> - Add support for address types (SMP_T_IPV4, SMP_T_IPV6)
> - Add support for HTTP method type (SMP_T_METH)
> - Use chunk_appendf() instead of chunk_printf() to append data
> - Check for buffer overflow and fail completely rather than truncate
> - Fix prefix filtering to skip variables without names
> - Extract string escaping logic to chunk_escape_string() in tools.c
> - Sort keywords alphabetically in sample_fetch_keywords
> - Remove SMP_F_CONST flag as trash chunk is not constant memory

Just a hint, when posting changes since a previous version of a patch,
you can place them after a line containing only "---", as this is the
end-of-message delimiter for git. It's very convenient as it allows to
place indications for the committer, questions, and even indicate changes
like above.

> diff --git a/include/haproxy/tools.h b/include/haproxy/tools.h
> index 160d5f3ac..1c6d53d02 100644
> --- a/include/haproxy/tools.h
> +++ b/include/haproxy/tools.h
> @@ -466,6 +466,13 @@ char *escape_string(char *start, char *stop,
>                   const char escape, const long *map,
>                   const char *string, const char *string_stop);
>  
> +/*
> + * Appends a quoted and escaped string to a chunk buffer. The string is
> + * enclosed in double quotes and special characters are escaped with 
> backslash.
> + * Returns 0 on success, -1 if the buffer is too small (output is rolled 
> back).
> + */
> + int chunk_escape_string(struct buffer *chunk, const char *str, size_t len);
> +

This part is independent and should be in its own patch since it's an
infrastructure change that may be used later by other code. But if you
want I can split it myself.

>  /* Below are RFC8949 compliant cbor encode helper functions, see source
>   * file for functions descriptions
>   */
> diff --git a/src/tools.c b/src/tools.c
> index 3acbc56c7..730fbc1d6 100644
> --- a/src/tools.c
> +++ b/src/tools.c
> @@ -2129,6 +2129,61 @@ char *escape_string(char *start, char *stop,
>       return NULL;
>  }
>  
> +/*
> + * Appends a quoted and escaped string to a chunk buffer. The string is
> + * enclosed in double quotes and special characters are escaped with 
> backslash:
> + * ", \, \r, \n, \b, \0
> + * Returns 0 on success, -1 if the buffer is too small (output is rolled 
> back).
> + */

I very much appreciate that you document output status codes, thank you
for that!

> + int chunk_escape_string(struct buffer *chunk, const char *str, size_t len)
> + {
> +      size_t initial_data = chunk->data;
> +      size_t i;
> +      
> +      /* Opening quote */
> +      if (chunk->data + 1 >= chunk->size)
> +              return -1;
> +      chunk->area[chunk->data++] = '"';
> +      
> +      /* Escape and append each character */
> +      for (i = 0; i < len; i++) {
> +              unsigned char c = str[i];
> +              const char *esc = NULL;
> +              size_t esc_len = 1;
> +              
> +              if (c == '"') { esc = "\\\""; esc_len = 2; }
> +              else if (c == '\\') { esc = "\\\\"; esc_len = 2; }
> +              else if (c == '\r') { esc = "\\r"; esc_len = 2; }
> +              else if (c == '\n') { esc = "\\n"; esc_len = 2; }
> +              else if (c == '\b') { esc = "\\b"; esc_len = 2; }
> +              else if (c == '\0') { esc = "\\0"; esc_len = 2; }
> +              
> +              if (esc) {
> +                      if (chunk->data + esc_len >= chunk->size) {
> +                              chunk->data = initial_data;
> +                              return -1;
> +                      }
> +                      memcpy(chunk->area + chunk->data, esc, esc_len);
> +                      chunk->data += esc_len;

I honestly think that it's slightly overkill to resort to memcpy() to
copy exactly two bytes whose first one is always a backslash, but for
the current use case that's OK. We can refine it later if we need to,
maybe by relying on the maps used by escape_string() for example.

> +              } else {
> +                      if (chunk->data + 1 >= chunk->size) {
> +                              chunk->data = initial_data;
> +                              return -1;
> +                      }
> +                      chunk->area[chunk->data++] = c;
> +              }
> +      }
> +      
> +      /* Closing quote */
> +      if (chunk->data + 1 >= chunk->size) {
> +              chunk->data = initial_data;
> +              return -1;
> +      }
> +      chunk->area[chunk->data++] = '"';
> +      
> +      return 0;
> + }
> +
>  /* CBOR helper to encode an uint64 value with prefix (3bits MAJOR type)
>   * according to RFC8949
>   *
> diff --git a/src/vars.c b/src/vars.c
> index 2654155ce..6966e23de 100644

So this is the part that really belongs to this patch.

> --- a/src/vars.c
> +++ b/src/vars.c
> @@ -58,6 +58,16 @@ static struct var_set_condition conditions_array[] = {
>         { NULL, 0 }
>  };
>  
> +/* Variable scope names with their prefixes for output */
> +static const char *var_scope_names[] = {
> +     [SCOPE_SESS] = "sess.",
> +     [SCOPE_TXN]  = "txn.",
> +     [SCOPE_REQ]  = "req.",
> +     [SCOPE_RES]  = "res.",
> +     [SCOPE_PROC] = "proc.",
> +     [SCOPE_CHECK] = "check.",
> +};
> +
>  /* returns the struct vars pointer for a session, stream and scope, or NULL 
> if
>   * it does not exist.
>   */
> @@ -352,6 +362,209 @@ static int smp_fetch_var(const struct arg *args, struct 
> sample *smp, const char
>       return vars_get_by_desc(var_desc, smp, def);
>  }
>  
> +/* Dumps all variables in the specified scope, optionally filtered by prefix.
> + * Output format: var1=value1, var2=value2, ...
> + * String values are quoted and escaped, binary values are hex-encoded 
> (x...).
> + * Returns 1 on success, 0 on failure (buffer too small).
> + * Note: When using prefix filtering, all variables are still visited, so 
> this
> + * should not be used with configs involving thousands of variables.
> + */

Regarding this last comment, one option we could have is to get rid of
the hash and index the strings only. It would save 8 bytes of RAM per
variable and list them in alphabetic order. However I sense that it
will be much slower, especially with the many configs that use similar
prefixes for many variables. But that's food for future experiments.

> +static int smp_fetch_dump_all_vars(const struct arg *args, struct sample 
> *smp, const char *kw, void *private)
> +{
> +     struct vars *vars;
> +     struct var *var;
> +     struct var_desc desc;
> +     const char *prefix = NULL;
> +     size_t prefix_len = 0;
> +     const char *delim = ", ";
> +     size_t delim_len = 2;
> +     int first = 1;
> +     int i;
> +     
> +     struct buffer *output;
> +     
> +     /* Get output buffer */
> +     output = get_trash_chunk();
> +     chunk_reset(output);
> +     
> +     /* Parse arguments */
> +     if (args[0].type == ARGT_SINT) {
> +             desc.scope = args[0].data.sint;
> +     } else {
> +             /* Auto-detect scope from context*/
> +             if (smp->strm)
> +                     desc.scope = SCOPE_TXN;
> +             else if (smp->sess)
> +                     desc.scope = SCOPE_SESS;
> +             else
> +                     desc.scope = SCOPE_PROC;
> +     }       
> +
> +     /* Optional prefix filter */
> +     if (args[1].type == ARGT_STR) {
> +             prefix = args[1].data.str.area;
> +             prefix_len = args[1].data.str.data;
> +     }
> +     
> +     /* Optional delimiter */
> +     if (args[2].type == ARGT_STR) {
> +             delim = args[2].data.str.area;
> +             delim_len = args[2].data.str.data;
> +     }
> +     
> +     desc.flags = 0;
> +     desc.name_hash = 0;
> +     
> +     vars = get_vars(smp->sess, smp->strm, &desc);
> +     if (!vars || vars->scope != desc.scope) 
> +             return 0;
> +     
> +     vars_rdlock(vars);
> +     
> +     /* Iterate through all variable roots */
> +     for (i = 0; i < VAR_NAME_ROOTS; i++) {
> +             var = cebu64_item_first(&vars->name_root[i], name_node, 
> name_hash, struct var);
> +             
> +             while (var) {
> +                     const char *scope_prefix;
> +                     size_t j;
> +
> +                     /* Check prefix filter */
> +                     if (prefix) {
> +                             if (!var->name || strncmp(var->name, prefix, 
> prefix_len) != 0) {
> +                                     var = 
> cebu64_item_next(&vars->name_root[i], name_node, name_hash, var);
> +                                     continue;
> +                             }
> +                     }
> +                     
> +                     /* Add delimiter */
> +                     if (!first) {
> +                             if (output->data + delim_len >= output->size) {
> +                                     vars_rdunlock(vars);
> +                                     output->data = 0;
> +                                     return 0;
> +                             }

I'm seeing this 4-line construct repeated many times below. It's
typically error handling that would be better served by a goto to
an error label at the bottom of the function. For example, call it
"fail_unlock" so that we know that it unlocks, and just goto there.

> +                     } else if (var-> data.type == SMP_T_BIN) {
> +                             /* Binary: hex encode with 'x' prefix */

I'm not convinced about the benefit of the 'x' prefix here. Is this in
order to avoid any ambiguity with any other encoding ? Maybe with ints ?
E.g. "x90" vs "90" ? If so that's fine, but I could be missing something.

> +                             const char *bin = var->data.u.str.area;
> +                             size_t len = var->data.u.str.data;
> +                             
> +                             if (output->data + 1 + len * 2 >= output->size) 
> {
> +                                     vars_rdunlock(vars);
> +                                     output->data = 0;
> +                                     return 0;
> +                             }       
> +
> +                             chunk_memcat(output, "x", 1);
> +                             for(j = 0; j < len; j++) {
> +                                     if (chunk_appendf(output, "%02x", 
> (unsigned char)bin[j]) < 0) {
> +                                             vars_rdunlock(vars);
> +                                             output->data = 0;
> +                                             return 0;
> +                                     }
> +                             }

Hint: there's dump_binary() which could probably do that for you even
simpler.

> +                     } else if (var->data.type == SMP_T_SINT) {
> +                             /* Integer */
> +                             if (chunk_appendf(output, "%lld", (long 
> long)var->data.u.sint) < 0) {
> +                                     vars_rdunlock(vars);
> +                                     output->data = 0;
> +                                     return 0;
> +                             }
> +                             
> +                     } else if (var->data.type == SMP_T_BOOL) {
> +                             /* Boolean */
> +                             const char *bool_str = var->data.u.sint ? 
> "true" : "false";
> +                             if (chunk_appendf(output, "%s", bool_str) < 0) {
> +                                     vars_rdunlock(vars);
> +                                     output->data = 0;
> +                                     return 0;
> +                             }
> +                     
> +                     } else if (var->data.type == SMP_T_IPV4 || 
> var->data.type == SMP_T_IPV6) {
> +                             /* Address */
> +                             char addr_str[INET6_ADDRSTRLEN];
> +                             if (addr_to_str(&var->data.u.ipv4, addr_str, 
> sizeof(addr_str)) == NULL) {
> +                                     if (chunk_appendf(output, "(addr)") < 
> 0) {
> +                                             vars_rdunlock(vars);
> +                                             output->data = 0;
> +                                             return 0;
> +                                     }
> +                             } else {
> +                                     if (chunk_appendf(output, "\"%s\"", 
> addr_str) < 0) {
> +                                             vars_rdunlock(vars);
> +                                             output->data = 0;
> +                                             return 0;
> +                                     }
> +                             }

If using double-quotes, how do you distinguish these addresses from pure
strings ? I know that some tools like to enclose IP addresses between
square brackets. I'm not particularly fond of them when not needed, but
maybe here that can avoid an ambiguity.

(...)
>  /*
>   * Clear the contents of a variable so that it can be reset directly.
>   * This function is used just before a variable is filled out of a sample's
> @@ -916,6 +1129,64 @@ static int smp_check_var(struct arg *args, char **err)
>       return vars_check_arg(&args[0], err);
>  }
>  
> +/* This function checks all arguments for dump_all_vars()
> + * Args: [scope], [prefix], [delimiter]
> + * Both arguments are optional
> + */
> +static int smp_check_dump_all_vars(struct arg *args, char **err)
> +{
> +     /* First argument (scope) is optional */
> +
> +     if (args[0].type == ARGT_STR) {
> +             const char *scope = args[0].data.str.area;
> +             size_t len = args[0].data.str.data;
> +             long long int scope_id;
> +             
> +             if (len == 4 && strncmp(scope, "sess", 4) == 0) {

Unless I'm mistaken, the args that you get from ARGT_STR are all
zero-terminated so you don't even need to use len+strncmp(). BTW,
This is only used for parsing the sample fetch arguments at boot
time, so this is not on the critical path (it will not be used
that often), so you can actually leverage the scope names array
you've added to find the scope using a loop, and avoid names
duplication.

> +     /* Third argument (delimiter) is optional */
> +     if (args[2].type != ARGT_STR && args[2].type != ARGT_STOP) {
> +             memprintf(err, "third argument must be a string (delimieter) or 
> omitted");
                                                                       ^
Warning, extra 'e' above.

Overall that's OK. I'm almost tempted to merge it as-is despite the
few comments but I'd prefer to give you an opportunity to refine it
before merging. Just let me know.

thanks,
Willy


Reply via email to