[Graham please use plain text emails or at least fix your MUA so that
one does not need to re-quote/format your messages when replying in
plain text]

On Wed, Apr 24, 2024 at 7:51 PM Graham Leggett <minf...@sharp.fm> wrote:
>
> On 24 Apr 2024, at 14:03, Yann Ylavic <ylavic....@gmail.com> wrote:
>>
>> Here in apr_buffer_str_set() we still have apr_ssize_t len, please
>> make it an apr_size_t for the reason explained in the other thread
>> (also the apr_buffer_mem_*() ones use apr_size_t already so it should
>> be consistent)?
>
> The apr_ssize_t is intended in this case.
>
> As per the docs, an apr_ssize_t of -1 means compute the length:
>
> https://github.com/apache/apr/blob/trunk/include/apr_buffer.h#L46
>
> /**
>  * When passing a string to apr_buffer_str_make, this value can be
>  * passed to indicate a string with unknown length, and have 
> apr_buffer_str_make
>  * compute the length automatically.
>  */
> #define APR_BUFFER_STRING     (-1)

I've got that, but it's still a valid value to pass in a size_t, which
with sign extension will become (apr_size_t)-1, i.e. APR_SIZE_MAX.
If len is a size_t it's still perfectly defined and works to pass it -1, though:
 #define APR_BUFFER_STRING  ((apr_size_t)-1) /* or APR_SIZE_MAX */
would make it an unsigned value and is probably cleaner.

>
> https://github.com/apache/apr/blob/trunk/include/apr_buffer.h#L125
>
>  * @param len The length of the string without terminating zero, or
>  * APR_BUFFER_STRING to have the length calculated.
>
> The definition of ssize_t is "unsigned length or -1", so we're using it 
> correctly.

The definition is not really "unsigned length or -1", it's _at least_
greater or equal to -1, but it has to be a signed type to be able to
represent -1 (in practice it's the same as ptrdiff_t, no real
restriction/trap when below -1).

But anyway, the point is that you cannot pass strlen() or any size_t
to a ssize_t without first checking that it's representable in a
ssize_t, so if len is defined as ssize_t the onus is on the user to
verify that the passed in value (possibly and likely an
unsigned/size_t which is what users use for sizes) fits in a ssize_t
*before* calling apr_buffer_str_set/make(). And then the checks in
apr_buffer_str_set/make() for len > APR_BUFFER_MAX are just useless,
the compiler might just have assumed that since it's undefined to pass
an unrepresentable value then len > APR_BUFFER_MAX just cannot happen
and it simply removes/optimizes out the test..

>
> The purpose of the apr_buffer is to make a stark and impossible to mix up 
> distinction between a string and a memory area, apr_buffer_src() and 
> apr_buffer_mem() definitely shouldn't be the same.

>>>
>>>     if (!str) {
>>>         buf->d.str = NULL;
>>>         buf->size = 0;
>>> +        buf->zero_terminated = 0;
>>>     }
>>>     else if (len < 0) {
>>>         apr_size_t slen = strlen(str);
>>>         if (slen <= APR_BUFFER_MAX) {
>>>             buf->d.str = str;
>>> -            buf->size = -(apr_off_t)(slen + 1);
>>> +            buf->size = slen;
>>> +            buf->zero_terminated = 1;
>>>         }
>>>         else {
>>>             buf->d.str = NULL;
>>>             buf->size = 0;
>>> +            buf->zero_terminated = 0;
>>
>> return APR_EINVAL?
>
> A NULL buffer is valid and intended.

But here it becomes NULL because len > APR_BUFFER_MAX, not because the
user passed NULL, so it's an error?

>>
>> Overall I think it could be simplified as:
>>
>> APR_DECLARE(apr_status_t) apr_buffer_str_set(apr_buffer_t *buf,
>>                                             char *str, apr_size_t len)
>> {
>>    if (len == APR_BUFFER_STRING) {
>>        len = str ? strlen(str) : 0;
>>    }
>>    else if (str ? str[len] != '\0' : len != 0) {
>>        return APR_EINVAL;
>>    }
>>    if (len > APR_BUFFER_MAX) {
>>        return APR_EINVAL; /* or APR_ENOSPC? */
>>    }
>>
>>    buf->d.str = str;
>>    buf->size = len;
>>    buf->zero_terminated = 1;
>>
>>    return APR_SUCCESS;
>> }
>
> This breaks when src is NULL.

What breaks? If src (actually str) is NULL then ->d.str is NULL but
it's what you want AFAICT (plus the above also enforces that len is
zero in this case).
Or do you mean ->zero_termined is broken? All right, so how about:
    buf->zero_terminated = (str != NULL);
?

>>
>> Likewise apr_ssize_t => apr_size_t len for apr_buffer_str_make(),
>> APR_BUFFER_STRING and co should be unsigned (e.g. APR_SIZE_MAX) too.
>
> We already use -1 to mean "compute the length" in various parts of APR, such 
> as apr_hash. We should keep the behaviour the same as the rest of APR.
>
> If "compute the length" was APR_SIZE_MAX, we couldn't distinguish between 
> "compute the length" and "too big".

I don't object to -1, I object to the signed type. APR_BUFFER_STRING =
-1/APR_SIZE_MAX is just a special value just like APR_HASH_KEY_STRING.
I which we didn't use apr_ssize_t for the apr_hash or apr_encode APIs
but that's where we are, let's not introduce a new one?

>>>
>>> APR_DECLARE(apr_size_t) apr_buffer_allocated(const apr_buffer_t *buf)
>>
>> apr_buffer_size() if ->size is changed to ->len ?
>
> apr_buffer_allocated() means physical size, apr_buffer_len() means logical 
> length. One you care about when you're interested in the logical data, the 
> other you care about when you're copying the physical data, for example when 
> you're writing it to shared memory or some other system with special 
> requirements.

All right, my (non-english native) thinking is that "physical size"
can just be called "size" (of the string/mem) and that "logical
length" just "length" (of the string/mem), thus that the ->size member
of the buffer is actually a length (not including the trailing NUL for
string buffers) and should be called ->len. But it's quite nitpicking
admittedly.

>>
>> Maybe we shouldn't check/accept src == NULL (and let it crash), and
>> alloc == NULL (it's a simple memcpy functionally, the user can do it
>> already), so just do the deep copy (apr_buffer_copy() =>
>> apr_buffer_clone() ?)
>
> We need the ability to store a NULL in a buffer, that's a valid value.

I understand that you want the ability for buf->d.str/mem to be NULL,
what I mean is that buf should not be NULL.
If it is, no defensive programming, just let it crash. For me src ==
NULL does not mean I want dst->d.mem = NULL.

>>
>> APR_DECLARE(int) apr_buffer_cmp(const apr_buffer_t *src,
>>                                const apr_buffer_t *dst)
>> {
>>    apr_size_t slen, dlen;
>>
>>    if (!src) {
>>        return dst ? 1 : 0;
>>    }
>>    if (!dst) {
>>        return -1;
>>    }
>>    if (src->size != dst->size) {
>>        return src->size < dst->size ? -1 : 1;
>>    }
>>
>>    return memcmp(src->d.mem, dst->d.mem, slen);
>> }
>>
>> Though I still don't think that we should handle NULLs here (let it crash..).
>
> NULL support is intended.

Same response as above, it should be allowed for str/mem but not for
src/dst IMHO.

>>>
>>> @@ -344,17 +319,17 @@ APR_DECLARE(char *) apr_buffer_pstrncat(
>>>             size += seplen;
>>>         }
>>>
>>> -        if (src->size < 0) {
>>> -            size += (apr_size_t)(-src->size) - 1;
>>> +        if (src->zero_terminated) {
>>> +            size += src->size;
>>
>> Why not honor APR_BUFFER_BASE64 if it's a string buffer?
>
> Base64 is something you do to binary data, not strings

It's possibly what you do, but I could want to base64 a string too.

> - type safety is the point behind the api.

Yes! So let's move to apr_size_t len :)

>
> The LDAP api returns data as either a string, or a binary blob. When you're 
> trying to log buffers, you want the string to stay a string, and the binary 
> blob to become a string safely.
>
> When you're tasked with creating an LDAP filter, you want your strings to 
> stay strings, and your binary blobs to be base64 encoded.
>
> The plan is to support the whole range of encodings, because we can.

I don't think the apr_buffer API should be bound to LDAP, though I'm
not opposed to it being helpful for LDAP too with the
apr_buffer_pstrncat() flags. But if users ask for APR_BUFFER_BASE64 we
should not ignore them because the buffer is a string.

>>>
>>> +    /** is the data zero terminated */
>>> +    unsigned int zero_terminated:1;
>>> +
>>> +    /** size of the data, excluding terminating zero */
>>> +#if defined(SIZE_MAX) && SIZE_MAX == APR_UINT64_MAX
>>> +    apr_size_t size:63;
>>> +#elif defined(SIZE_MAX) && SIZE_MAX == APR_UINT32_MAX
>>> +    apr_size_t size:31;
>>> +#else
>>> +#error sizeof size_t is neither 64 nor 32 bit (SIZE_MAX not defined)
>>> +#endif
>>
>> Is sizeof(apr_buffer_t) still 8 (32bit) or 16 (64bit) with
>> ->zero_terminated and ->size in two bitmaps of different type/width
>> (unsigned int for the former, apr_size_t for the latter)?
>> It would be simpler to have both use apr_size_t IMHO.
>
> I went looking for the formal definition of this, and all I could find is 
> that bits are bits.
>
> In theory there should be no need for a 1 bit wide field and a 31 bit wide 
> field to have the same type, both will pack to a total of 32 bits.

It seems that gcc and probably other compilers do indeed pack
bit-fields regardless of the adjacent types (per [1]), but from
reading here and there it seems to be mostly compiler/implementation
defined, at least not formally/strictly standard defined.
So I would go with a single type for all the bits, an apr_size_t since
the whole is meant to fit in a single one (and I don't see what
unsigned int buys us here).

[1] 
https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Bit-Field-Packing.html


Regards;
Yann.

Reply via email to