On 21 Dec 2023, at 06:11, Ruediger Pluem <rpl...@apache.org> wrote:

>> +APR_DECLARE(apr_status_t) apr_escape_json(char *escaped, const char *str,
>> +        apr_ssize_t slen, int quote, apr_size_t *len)
> 
> Shouldn't we provide a length for the escaped buffer to be able to do some 
> sanity checking against overflows?
> We could ignore this parameter in case that escaped is NULL.
> Otherwise I think we should have a big fat warning in the doxygen 
> documentation of this function that it
> is extremely dangerous to call this function without providing a buffer of 
> sufficient length and that
> apr_escape_json should be called with escaped equal NULL before to determine 
> this length, just like as
> we do in apr_pescape_json.

All of the apr_escape_[not-p]*() functions work the same way. Pass NULL for the 
destination (escaped), and you will get the exact length of the buffer you need 
(including NUL byte). The apr_escape_p*() functions all show examples of this 
being used.

In most cases you also can potentially get a valid length and APR_NOTFOUND, 
which means “don’t bother allocating any memory and wasting cycles doing a 
conversion or copying, no conversion is necessary, but if you do decide to, 
that’s fine too”.

Uniquely to apr_escape_json(), some protocols require you to care whether the 
input was valid UTF-8 or not (see JWT). In these cases you get a valid length, 
and APR_EINVAL to warn you that an invalid character is present. All invalid 
characters are removed and replaced by the UTF8 “replacement” character, so the 
output is always a valid UTF8 string (again, see JWT). Again, while logging for 
example you might be fine with invalid characters, the output will always be 
valid UTF8 so it’s up to the end user to decide how much they care.

This is (or should be if not) covered in the doxygen already in the header file.

https://github.com/apache/apr/blob/trunk/include/apr_escape.h#L34

In short, the apr_escape API tries to allocate the least memory possible, 
avoids strlen(), and does the least work possible during conversion.

>> +                        else if (((c >> 7) == 0x00)) {
> 
> How can this case happen? c's like this should fail the TEST_CHAR(c, 
> T_ESCAPE_JSON) test.

This is for completeness, it felt weird to leave this case blank.

>> +                        else if (((c >> 7) == 0x00)) {
> 
> How can this case happen? c's like this should fail the TEST_CHAR(c, 
> T_ESCAPE_JSON) test.

Same here.

>> +APR_DECLARE(const char *) apr_pescape_json(apr_pool_t *p, const char *src,
>> +        apr_ssize_t srclen, int quote)
> 
> The caller cannot easily determine if the src contained invalid data or was 
> already
> properly escaped.
> Can't we do a
> 
> APR_DECLARE(apr_status_t) pr_pescape_json(apr_pool_t *p, char **dest, const 
> char *src,
>        apr_ssize_t srclen, int quote)
> 
> instead which returns the status of apr_escape_json and the escaped JSON in 
> dest?

If the caller cares about valid/invalid data, they need to use 
apr_escape_json() to give them the full detail as a apr_status_t. Just the act 
of determining the length will tell if the input is invalid, so no cycles are 
wasted by the caller allocating anything.

The “caller doesn’t care” case is apr_pescape_json(), which is “you’ll always 
get a valid UTF8 string out, regardless of what went in”.

Regards,
Graham
—

Reply via email to