On Thu, Apr 18, 2024 at 3:58 PM Graham Leggett <[email protected]> wrote:
>
> On 18 Apr 2024, at 13:15, Yann Ylavic <[email protected]> wrote:
>
>> Indeed at least APR_BUFFER_MAX and buf->size of should be of the same
>> signedness.
>
>
> APR_BUFFER_MAX represents the size limit visible outside the API. This is
> always positive.
I meant the same signedness of the type, not the value itself.
If APR_BUFFER_MAX is APR_SIZE_MAX/2 it's a size_t (thus unsigned),
while buf->size is an apr_off_t (thus signed), not the same types.
>
> buf->size is an internal implementation detail. This is invisible.
>
> The only reason we can see inside apr_buffer_t is because we want to allocate
> them on the stack, and make arrays of them, so code outside needs to know the
> size.
Fair enough, though there probably should be big warning a the struct
apr_buffer_t definition to not use ->size directly.
>
>> But let me plead again for a much simpler ->size of type apr_size_t,
>> checked against APR_BUFFER_MAX=APR_SIZE_MAX/2 wherever an apr_buffer_t
>> is initialized, using the high bit of ->size for string vs plain
>> buffer, and then getting rid of off_t/ssize_t plus all the fancy
>> signed arithmetics in the apr_buffer code (we don't care about the
>> sizeof(off_t) or anything like that anymore)..
>
>
> I looked at it, and it was more confusing.
>
> Right now, positive is a memory buffer, negative is a string. I don't know
> why we would need to make it more complicated than that.
I think we can hardly make more complicated than:
APR_DECLARE(apr_size_t) apr_buffer_len(const apr_buffer_t *buf)
{
if (buf->size < 0) {
return (apr_size_t)((-buf->size) - 1);
}
else {
return (apr_size_t)buf->size;
}
}
...
With an apr_size_t ->size it could be:
APR_DECLARE(apr_size_t) apr_buffer_len(const apr_buffer_t *buf)
{
return buf->size & APR_BUFFER_MAX;
}
which I find much more simpler (and efficient) personally.
Another exemple:
APR_DECLARE(apr_status_t) apr_buffer_str_set(apr_buffer_t *buf,
char *str, apr_ssize_t len)
{
if (len > APR_BUFFER_MAX) {
return APR_EINVAL;
}
if (!str) {
buf->d.str = NULL;
buf->size = 0;
}
else if (len < 0) {
apr_size_t slen = strlen(str);
if (slen <= APR_BUFFER_MAX) {
buf->d.str = str;
buf->size = -(apr_off_t)(slen + 1);
}
else {
buf->d.str = NULL;
buf->size = 0;
}
}
else {
buf->d.str = str;
buf->size = -(len + 1);
}
return APR_SUCCESS;
}
could be:
APR_DECLARE(apr_status_t) apr_buffer_str_set(apr_buffer_t *buf,
char *str, apr_size_t len)
{
if (len == APR_BUFFER_STRING) {
len = str ? strlen(str) : 0;
}
if (len > APR_BUFFER_MAX) {
return APR_EINVAL;
}
if (!str) {
buf->d.str = NULL;
buf->size = 0;
}
else {
buf->d.str = str;
buf->size = len;
}
buf->size |= ~APR_BUFFER_MAX;
return APR_SUCCESS;
}
Should be the same kind of simplification everywhere I suppose.
>
>> Currently apr_buffer_str_make(mystring, strlen(mystring)) is UB, the
>> API is just broken IMHO.
>
> Can you clarify why? What, specifically, is broken, so I can fix it?
Passing/converting an unsigned value to a signed one where the
unsigned value cannot be represented in the signed one is undefined
behaviour in C.
So with:
APR_DECLARE(apr_status_t) apr_buffer_str_set(apr_buffer_t *buf,
char *str, apr_ssize_t len);
one cannot simply do (w/o UB):
apr_size_t len = strlen(mystr);
apr_buffer_str_set(buf, mystr, len);
without first checking that len <= APR_BUFFER_MAX, which is not really
user friendly.
Whereas with the proposed:
APR_DECLARE(apr_status_t) apr_buffer_str_set(apr_buffer_t *buf,
char *str, apr_size_t len);
the will return EINVAL if len > APR_BUFFER_MAX without the user having
to do the check to avoid UB.
Regards;
Yann.