On 14.04.21 04:36, Willy Tarreau wrote:
On Wed, Apr 14, 2021 at 03:02:20AM +0200, Aleksandar Lazic wrote:
But then, could it make sense to also support "strict integers": values
that can accurately be represented as integers and which are within the
JSON valid range for integers (-2^52 to 2^52 with no decimal part) ?
This would then make the converter return nothing in case of violation
(i.e. risk of losing precision). This would also reject NaN and infinite
that the lib supports.

You mean the same check which is done in arith_add().

Not exactly because arith_add only checks for overflows after addition
and tries to cap the result, but I'd rather just say that if the decoded
number is <= -2^53 or >= 2^53 then the converter should return a no match
in case an integer was requested.

Okay got you.

There is such a check in stats.c which I copied to sample.c but this does
not look right.

Maybe I should create a include/haproxy/json-t.h and add the values there,
what do you think?


```
/* Limit JSON integer values to the range [-(2**53)+1, (2**53)-1] as per
 * the recommendation for interoperable integers in section 6 of RFC 7159.
 */
#define JSON_INT_MAX ((1ULL << 53) - 1)
#define JSON_INT_MIN (0 - JSON_INT_MAX)

/* This sample function get the value from a given json string.
 * The mjson library is used to parse the json struct
 */
static int sample_conv_json_query(const struct arg *args, struct sample *smp, 
void *private)
{
        struct buffer *trash = get_trash_chunk();
        const char *p; /* holds the temporay string from mjson_find */
        int tok, n;    /* holds the token enum and the length of the value */
        int rc;        /* holds the return code from mjson_get_string */

        tok = mjson_find(smp->data.u.str.area, smp->data.u.str.data, 
args[0].data.str.area, &p, &n);

        switch(tok) {
                case MJSON_TOK_NUMBER:
                        if (args[1].type == ARGT_SINT) {
                                smp->data.u.sint = atoll(p);

                                if (smp->data.u.sint < JSON_INT_MIN || 
smp->data.u.sint > JSON_INT_MAX)
                                        return 0;

                                smp->data.type = SMP_T_SINT;
                        } else {
...
```

Hmmmm that's not your fault but now I'm seeing that we already have a
converter inappropriately called "json", so we don't even know in which
direction it works by just looking at its name :-(  Same issue as for
base64.

May I suggest that you call yours "json_decode" or maybe shorter
"json_dec" so that it's more explicit that it's the decode one ? Because
for me "json_string" is the one that will emit a json string from some
input (which it is not). Then we could later create "json_enc" and warn
when "json" alone is used. Or even "jsdec" and "jsenc" which are much
shorter and still quite explicit.

How about "json_query" because it's exactly what it does :-)

I'm not familiar with the notion of "query" to decode and extract contents
but I'm not the most representative user and am aware of the "jq" command-
line utility that does this. So if it sounds natural to others I'm fine
with this.

I'm seeing that there's a very nice mjson_find() which does *exactly* what
you need:

     "In a JSON string s, len, find an element by its JSONPATH path. Save
      found element in tokptr, toklen. If not found, return JSON_TOK_INVALID.
      If found, return one of: MJSON_TOK_STRING, MJSON_TOK_NUMBER,
      MJSON_TOK_TRUE, MJSON_TOK_FALSE, MJSON_TOK_NULL, MJSON_TOK_ARRAY,
      MJSON_TOK_OBJECT.
      If a searched key contains ., [ or ] characters, they should be escaped
      by a backslash."

So you get the type in return. I think you can then call one of the
related functions depending on what is found, which is more reliable
than iterating over multiple attempts.

Oh yes, this sounds like a better approach.
I have now used this suggestion and I hope you can help me to fix the double 
parsing
issue or is it acceptable to parse the input twice?

 From what I've seen in the code in the lib you have no other option.
I thought it might be possible to call mjson_get_string() on the
resulting pointer but you would need to find a way to express that
you want to extract the immediate content, maybe by having an empty
key designation or something like this. This point is not clear to
me and the unit tests in the project all re-parse the input string
after mjson_find(), so probably this is the way to do it.

The check functions handles the int arg now as suggested.

```
/* This function checks the "json_query" converter's arguments. */
static int sample_check_json_query(struct arg *arg, struct sample_conv *conv,
                            const char *file, int line, char **err)
{
        if (arg[0].data.str.data == 0) { /* empty */
                memprintf(err, "json_path must not be empty");
                return 0;
        }

        if (arg[1].data.str.data != 0) {
                        if (strncmp(arg[1].data.str.area, "int", sizeof("int")) 
!= 0) {
                                memprintf(err, "output_type only supports \"int\" as 
argument");
                                return 0;
                        } else {
                                arg[1].type = ARGT_SINT;
                                arg[1].data.sint = 0;

OK!

I use now the token enum but I have some troubles to handle the c types 
properly.

OK that's better, but be careful about this:

                        trash->data = mjson_get_string(smp->data.u.str.area, 
smp->data.u.str.data, args[0].data.str.area, trash->area, trash->size);
                        if (trash->data == -1 ) {

You're sending the result into trash->data whose type you don't know,
and expect it to match against -1. I'd rather store the result into
a temporary variable of the same type as the function's return (int),
test it, and only if equal, I'd assign this value to the return sample's
length.

This was my first attempt ;-)
Now it's this.

```
                        rc = mjson_get_string(smp->data.u.str.area, 
smp->data.u.str.data, args[0].data.str.area, trash->area, trash->size);
                        if (rc == -1 ) {
                                /* invalid string */
                                return 0;
                        } else {
                                trash->data = rc;
                                smp->data.u.str = *trash;
                                smp->data.type = SMP_T_STR;
                        }
```
Willy



Reply via email to