Hi,

On Fri, Nov 07, 2025 at 04:19:38PM +0900, Hyeonggeun Oh wrote:
> This patch implements dump_all_vars([scope],[prefix]) sample fetch
> function that dumps all variables in a given scope, optionally
> filtered by name prefix.
> 
> Output format: var1=value1, var2=value2, ...
> - String values are quoted and escaped (", \, \r, \n, \b, \0)
> - All sample types are supported via sample_convert()
> - Scope can be: sess, txn, req, res, proc
> - Prefix filtering is optional
> 
> Example usage:
>   http-request return string %[dump_all_vars(txn)]
>   http-request return string %[dump_all_vars(txn,user)]

Pretty nice description, thanks. Stupid question, do we have a way to
figure the variable's type based on its value ? I get it that strings
are quoted, for binary I don't know, addresses should be distinguishable
from integers, booleans are probably true/false or maybe 0/1 ? And there
are HTTP methods, which probably are the only ones starting with letters
without quotes ? So maybe there's no ambiguity ?

> +/* Dumps all variables in the specified scope, optionally filtered by prefix.
> + * Output format: var1=value1, var2=value2, ...
> + * String values are quoted and escaped.
> + */

Please do not forget to include return domain and potential failure cases
(e.g. what happens if the output doesn't fit, do we truncate or report an
error?). BTW I'd prefer that we fail in such cases rather than truncating,
as we've done this mistake in some dumps in the past and it's terrible
because sometimes a large output doesn't fit and is skipped, but the next
one is OK and passes, and the other side that consumes what we produce
has no way to know about a truncation event.

> +static int smp_fetch_dump_all_vars(const struct arg *args, struct sample 
> *smp, const char *kw, void *private)
> +{
> +     struct vars *vars;
> +     struct var *var;
> +     struct var_desc desc;
> +     const char *prefix = NULL;
> +     size_t prefix_len = 0;
> +     int first = 1;
> +     int i;
> +     
> +     /* Allocate output buffer */
> +     struct buffer *output = alloc_trash_chunk();
> +     if (!output)
> +             return 0;

For such simple functions that don't call complicated ones, you can
use get_trash_chunk() that will just pick the "next" trash chunk, that
cannot fail and that you don't need to release.

> +     
> +     chunk_reset(output);

This one is guaranteed by both alloc_trash_chunk() and get_trash_chunk()
so you don't need it.

> +     /* Parse arguments */
> +     if (args[0].type == ARGT_STR) {
> +             const char *scope_str = args[0].data.str.area;
> +             size_t scope_len = args[0].data.str.data;
> +             
> +             if (scope_len == 4 && strncmp(scope_str, "sess", 4) == 0)
> +                     desc.scope = SCOPE_SESS;
> +             else if (scope_len == 3 && strncmp(scope_str, "txn", 3) == 0)
> +                     desc.scope = SCOPE_TXN;
> +             else if (scope_len == 3 && strncmp(scope_str, "req", 3) == 0)
> +                     desc.scope = SCOPE_REQ;
> +             else if (scope_len == 3 && strncmp(scope_str, "res", 3) == 0)
> +                     desc.scope = SCOPE_RES;
> +             else if (scope_len == 4 && strncmp(scope_str, "proc", 4) == 0)
> +                     desc.scope = SCOPE_PROC;
> +             else {
> +                     free_trash_chunk(output);
> +                     return 0;
> +             }

It is preferable to move all this to a function that handles the keyword
parsing/validation and that will set update the args accordingly. Oh I'm
seeing that you already have smp_check_dump_all_vars(). Perfect. Please
have a look at smp_check_date_unit() for an example of how to update the
args for runtime consumption. It takes a string arg on input, then replaces
it with an int that the runtime function will use, exactly like here with
desc.scope that can be taken from the first arg.

> +     vars_rdlock(vars);
> +     
> +     /* Iterate through all variable roots */
> +     for (i = 0; i < VAR_NAME_ROOTS; i++) {
> +             var = cebu64_item_first(&vars->name_root[i], name_node, 
> name_hash, struct var);
> +             
> +             while (var) {
> +                     /* Check prefix filter */
> +                     if (prefix && var->name) {
> +                             if (strncmp(var->name, prefix, prefix_len) != 
> 0) {
> +                                     var = 
> cebu64_item_next(&vars->name_root[i], name_node, name_hash, var);
> +                                     continue;
> +                             }
> +                     }

Hmmm interesting, I hadn't thought about this: since we only index hashes,
not names, you're forced to scan all variables to pick the right ones. Maybe
it's just an indication that we should stop indexing hashes and index the
real names instead. It's not critical, as those using that will not be the
same as those having hundreds of thousands of variables. But ultimately
this could happen in the same config for two distinct use cases. I'm not
opposed to keeping it like this for now and revisiting it later. However
if you're interested in changing the indexing to get rid of the hash now
and just index on the string, you'd be welcome! My only concern is that
hashes tend to be well distributed while strings can be slower to look up
due to common prefixes, and I'm also careful about not slowing down the
most common use case.

Let's maybe just address this with a note in the doc about the usage
of the keyword, indicating that even when a prefix is specified, all
variables are visited and that this must not be used with configs
involving thousands of variables.

> +                     
> +                     /* Add comma separator */
> +                     if (!first) {
> +                             chunk_appendf(output, ", ");

I anticipate that some users will quickly ask for this delimiting string
to be configurable (e.g. optional arg3). Some might indeed prefer a single
char (to ease splitting without having to trim spaces), others might prefer
a char that isn't used as an HTTP header field delimiter, etc. Maybe it's
not much to permit to configure it, it will just require to pass the prefix
and users can put "" there.

> +                     /* Add variable name with scope prefix */
> +                     if (var->name) {
> +                             const char *scope_prefix = "";
> +                             switch (desc.scope) {
> +                                     case SCOPE_SESS: scope_prefix = 
> "sess."; break;
> +                                     case SCOPE_TXN:  scope_prefix = "txn."; 
> break;
> +                                     case SCOPE_REQ:  scope_prefix = "req."; 
> break;
> +                                     case SCOPE_RES:  scope_prefix = "res."; 
> break;
> +                                     case SCOPE_PROC: scope_prefix = 
> "proc."; break;
> +                                     default: break;

Probably that it's time to have a global var_scope_str[] array in vars.c to
directly access these ?

> +                             chunk_appendf(output, "%s%s=", scope_prefix, 
> var->name);
> +                     } else {

When does it happen that the name is not set ?

> +                             chunk_appendf(output, "var_%016llx=", (unsigned 
> long long)var->name_hash);
> +                     }
> +
> +                     /* Convert value based on type - WITHOUT 
> sample_convert() */
> +                     if (var->data.type == SMP_T_STR || var->data.type == 
> SMP_T_BIN) {
> +                             /* String: quote and escape */

OK I have my response for binary vars :-) I think we'd have to encode it
in hex because binary contains anything (e.g. AES-encrypted data) and
cannot be safely represented nor transported as a string, even when
encoding some chars. Maybe we should just send a single letter "x"
followed by hex digits, without quotes. This will remain distinguishable
from numbers and strings. E.g:
 
       var_foo=x666f6f,var_bar=x626172

> +                             chunk_appendf(output, "\"");
> +                             const char *str = var->data.u.str.area;
> +                             size_t len = var->data.u.str.data;
> +                             
> +                             for (size_t j = 0; j < len && output->data < 
> output->size - 10; j++) {
> +                                     unsigned char c = str[j];
> +                                     if (c == '"') chunk_appendf(output, 
> "\\\"");
> +                                     else if (c == '\\') 
> chunk_appendf(output, "\\\\");
> +                                     else if (c == '\r') 
> chunk_appendf(output, "\\r");
> +                                     else if (c == '\n') 
> chunk_appendf(output, "\\n");
> +                                     else if (c == '\b') 
> chunk_appendf(output, "\\b");
> +                                     else if (c == '\0') 
> chunk_appendf(output, "\\0");
> +                                     else chunk_appendf(output, "%c", c);
> +                             }
> +                             chunk_appendf(output, "\"");

In my opinion this part should be added as a function in tools.c, near
encode_string(), encode_chunk(), escape_string(), and qstr(). In fact
yours could be a cleaner replacement for qstr() though I don't know
where it's used so I wouldn't play games with that. But for me what
you've done here is sort of a "safe" qstr. However it doesn't check
for truncation, so you could end up with a truncated output (and
possibly one that misses the closing quote, which could be problematic
in some cases if such a string is later then trimmed then re-concatenated
to other ones, in which case it could be confused with commas and names).

That's exactly where a dedicated function taking care of all of this
would be better. Just like qstr() you could guarantee that the closing
quote is present, and return a status indicating if the output was
truncated or not.

> +                             
> +                     } else if (var->data.type == SMP_T_SINT) {
> +                             /* Integer */
> +                             chunk_appendf(output, "%lld", (long 
> long)var->data.u.sint);
> +                             
> +                     } else if (var->data.type == SMP_T_BOOL) {
> +                             /* Boolean */
> +                             chunk_appendf(output, "%s", var->data.u.sint ? 
> "true" : "false");
> +                             
> +                     } else {
> +                             /* Other types: unconvertible */
> +                             chunk_appendf(output, "(type:%d)", 
> var->data.type);

Some might want to put addresses there (e.g. copies of src/dst etc). We
have addr_to_str() that is already used for this. It does have its
limitations (only INET/INET6 decoded, UNIX says "unix"), but that could
be sufficient if documented as such.

> @@ -915,6 +1061,43 @@ static int smp_check_var(struct arg *args, char **err)
>       return vars_check_arg(&args[0], err);
>  }
>  
> +/* This function checks all arguments for dump_all_vars()
> + * Args: [scope], [prefix]
> + * Both arguments are optional strings
> + */
> +static int smp_check_dump_all_vars(struct arg *args, char **err)
> +{
> +     /* First argument (scope) is optional */
> +     if (args[0].type != ARGT_STR && args[0].type != ARGT_STOP) {
> +             memprintf(err, "first argument must be a string (scope) or 
> omitted");
> +             return 0;
> +     }
> +     
> +     /* If scope is provided, validate it */
> +     if (args[0].type == ARGT_STR) {
> +             const char *scope = args[0].data.str.area;
> +             size_t len = args[0].data.str.data;
> +             
> +             if (!((len == 4 && strncmp(scope, "sess", 4) == 0) ||
> +                   (len == 3 && strncmp(scope, "txn", 3) == 0) ||
> +                   (len == 3 && strncmp(scope, "req", 3) == 0) ||
> +                   (len == 3 && strncmp(scope, "res", 3) == 0) ||
> +                   (len == 4 && strncmp(scope, "proc", 4) == 0))) {
> +                     memprintf(err, "invalid scope '%.*s', must be one of: 
> sess, txn, req, res, proc",
> +                              (int)len, scope);
> +                     return 0;
> +             }

So this is the part I mentioned above that can finish the job by storing
the scope as an int into agsr[0] so that the runtime function doesn't
have to parse it again.

> @@ -1409,6 +1592,7 @@ INITCALL0(STG_PREPARE, vars_init);
>  static struct sample_fetch_kw_list sample_fetch_keywords = {ILH, {
>  
>       { "var", smp_fetch_var, ARG2(1,STR,STR), smp_check_var, SMP_T_ANY, 
> SMP_USE_CONST },
> +     { "dump_all_vars", smp_fetch_dump_all_vars, ARG2(0,STR,STR), 
> smp_check_dump_all_vars, SMP_T_STR, SMP_USE_CONST },

Please keep keywords alphabetically sorted, they're easier to spot for
future additions.

Willy


Reply via email to