[Graham please use plain text emails or at least fix your MUA so that one does not need to re-quote/format your messages when replying in plain text]
On Wed, Apr 24, 2024 at 7:51 PM Graham Leggett <minf...@sharp.fm> wrote: > > On 24 Apr 2024, at 14:03, Yann Ylavic <ylavic....@gmail.com> wrote: >> >> Here in apr_buffer_str_set() we still have apr_ssize_t len, please >> make it an apr_size_t for the reason explained in the other thread >> (also the apr_buffer_mem_*() ones use apr_size_t already so it should >> be consistent)? > > The apr_ssize_t is intended in this case. > > As per the docs, an apr_ssize_t of -1 means compute the length: > > https://github.com/apache/apr/blob/trunk/include/apr_buffer.h#L46 > > /** > * When passing a string to apr_buffer_str_make, this value can be > * passed to indicate a string with unknown length, and have > apr_buffer_str_make > * compute the length automatically. > */ > #define APR_BUFFER_STRING (-1) I've got that, but it's still a valid value to pass in a size_t, which with sign extension will become (apr_size_t)-1, i.e. APR_SIZE_MAX. If len is a size_t it's still perfectly defined and works to pass it -1, though: #define APR_BUFFER_STRING ((apr_size_t)-1) /* or APR_SIZE_MAX */ would make it an unsigned value and is probably cleaner. > > https://github.com/apache/apr/blob/trunk/include/apr_buffer.h#L125 > > * @param len The length of the string without terminating zero, or > * APR_BUFFER_STRING to have the length calculated. > > The definition of ssize_t is "unsigned length or -1", so we're using it > correctly. The definition is not really "unsigned length or -1", it's _at least_ greater or equal to -1, but it has to be a signed type to be able to represent -1 (in practice it's the same as ptrdiff_t, no real restriction/trap when below -1). But anyway, the point is that you cannot pass strlen() or any size_t to a ssize_t without first checking that it's representable in a ssize_t, so if len is defined as ssize_t the onus is on the user to verify that the passed in value (possibly and likely an unsigned/size_t which is what users use for sizes) fits in a ssize_t *before* calling apr_buffer_str_set/make(). And then the checks in apr_buffer_str_set/make() for len > APR_BUFFER_MAX are just useless, the compiler might just have assumed that since it's undefined to pass an unrepresentable value then len > APR_BUFFER_MAX just cannot happen and it simply removes/optimizes out the test.. > > The purpose of the apr_buffer is to make a stark and impossible to mix up > distinction between a string and a memory area, apr_buffer_src() and > apr_buffer_mem() definitely shouldn't be the same. >>> >>> if (!str) { >>> buf->d.str = NULL; >>> buf->size = 0; >>> + buf->zero_terminated = 0; >>> } >>> else if (len < 0) { >>> apr_size_t slen = strlen(str); >>> if (slen <= APR_BUFFER_MAX) { >>> buf->d.str = str; >>> - buf->size = -(apr_off_t)(slen + 1); >>> + buf->size = slen; >>> + buf->zero_terminated = 1; >>> } >>> else { >>> buf->d.str = NULL; >>> buf->size = 0; >>> + buf->zero_terminated = 0; >> >> return APR_EINVAL? > > A NULL buffer is valid and intended. But here it becomes NULL because len > APR_BUFFER_MAX, not because the user passed NULL, so it's an error? >> >> Overall I think it could be simplified as: >> >> APR_DECLARE(apr_status_t) apr_buffer_str_set(apr_buffer_t *buf, >> char *str, apr_size_t len) >> { >> if (len == APR_BUFFER_STRING) { >> len = str ? strlen(str) : 0; >> } >> else if (str ? str[len] != '\0' : len != 0) { >> return APR_EINVAL; >> } >> if (len > APR_BUFFER_MAX) { >> return APR_EINVAL; /* or APR_ENOSPC? */ >> } >> >> buf->d.str = str; >> buf->size = len; >> buf->zero_terminated = 1; >> >> return APR_SUCCESS; >> } > > This breaks when src is NULL. What breaks? If src (actually str) is NULL then ->d.str is NULL but it's what you want AFAICT (plus the above also enforces that len is zero in this case). Or do you mean ->zero_termined is broken? All right, so how about: buf->zero_terminated = (str != NULL); ? >> >> Likewise apr_ssize_t => apr_size_t len for apr_buffer_str_make(), >> APR_BUFFER_STRING and co should be unsigned (e.g. APR_SIZE_MAX) too. > > We already use -1 to mean "compute the length" in various parts of APR, such > as apr_hash. We should keep the behaviour the same as the rest of APR. > > If "compute the length" was APR_SIZE_MAX, we couldn't distinguish between > "compute the length" and "too big". I don't object to -1, I object to the signed type. APR_BUFFER_STRING = -1/APR_SIZE_MAX is just a special value just like APR_HASH_KEY_STRING. I which we didn't use apr_ssize_t for the apr_hash or apr_encode APIs but that's where we are, let's not introduce a new one? >>> >>> APR_DECLARE(apr_size_t) apr_buffer_allocated(const apr_buffer_t *buf) >> >> apr_buffer_size() if ->size is changed to ->len ? > > apr_buffer_allocated() means physical size, apr_buffer_len() means logical > length. One you care about when you're interested in the logical data, the > other you care about when you're copying the physical data, for example when > you're writing it to shared memory or some other system with special > requirements. All right, my (non-english native) thinking is that "physical size" can just be called "size" (of the string/mem) and that "logical length" just "length" (of the string/mem), thus that the ->size member of the buffer is actually a length (not including the trailing NUL for string buffers) and should be called ->len. But it's quite nitpicking admittedly. >> >> Maybe we shouldn't check/accept src == NULL (and let it crash), and >> alloc == NULL (it's a simple memcpy functionally, the user can do it >> already), so just do the deep copy (apr_buffer_copy() => >> apr_buffer_clone() ?) > > We need the ability to store a NULL in a buffer, that's a valid value. I understand that you want the ability for buf->d.str/mem to be NULL, what I mean is that buf should not be NULL. If it is, no defensive programming, just let it crash. For me src == NULL does not mean I want dst->d.mem = NULL. >> >> APR_DECLARE(int) apr_buffer_cmp(const apr_buffer_t *src, >> const apr_buffer_t *dst) >> { >> apr_size_t slen, dlen; >> >> if (!src) { >> return dst ? 1 : 0; >> } >> if (!dst) { >> return -1; >> } >> if (src->size != dst->size) { >> return src->size < dst->size ? -1 : 1; >> } >> >> return memcmp(src->d.mem, dst->d.mem, slen); >> } >> >> Though I still don't think that we should handle NULLs here (let it crash..). > > NULL support is intended. Same response as above, it should be allowed for str/mem but not for src/dst IMHO. >>> >>> @@ -344,17 +319,17 @@ APR_DECLARE(char *) apr_buffer_pstrncat( >>> size += seplen; >>> } >>> >>> - if (src->size < 0) { >>> - size += (apr_size_t)(-src->size) - 1; >>> + if (src->zero_terminated) { >>> + size += src->size; >> >> Why not honor APR_BUFFER_BASE64 if it's a string buffer? > > Base64 is something you do to binary data, not strings It's possibly what you do, but I could want to base64 a string too. > - type safety is the point behind the api. Yes! So let's move to apr_size_t len :) > > The LDAP api returns data as either a string, or a binary blob. When you're > trying to log buffers, you want the string to stay a string, and the binary > blob to become a string safely. > > When you're tasked with creating an LDAP filter, you want your strings to > stay strings, and your binary blobs to be base64 encoded. > > The plan is to support the whole range of encodings, because we can. I don't think the apr_buffer API should be bound to LDAP, though I'm not opposed to it being helpful for LDAP too with the apr_buffer_pstrncat() flags. But if users ask for APR_BUFFER_BASE64 we should not ignore them because the buffer is a string. >>> >>> + /** is the data zero terminated */ >>> + unsigned int zero_terminated:1; >>> + >>> + /** size of the data, excluding terminating zero */ >>> +#if defined(SIZE_MAX) && SIZE_MAX == APR_UINT64_MAX >>> + apr_size_t size:63; >>> +#elif defined(SIZE_MAX) && SIZE_MAX == APR_UINT32_MAX >>> + apr_size_t size:31; >>> +#else >>> +#error sizeof size_t is neither 64 nor 32 bit (SIZE_MAX not defined) >>> +#endif >> >> Is sizeof(apr_buffer_t) still 8 (32bit) or 16 (64bit) with >> ->zero_terminated and ->size in two bitmaps of different type/width >> (unsigned int for the former, apr_size_t for the latter)? >> It would be simpler to have both use apr_size_t IMHO. > > I went looking for the formal definition of this, and all I could find is > that bits are bits. > > In theory there should be no need for a 1 bit wide field and a 31 bit wide > field to have the same type, both will pack to a total of 32 bits. It seems that gcc and probably other compilers do indeed pack bit-fields regardless of the adjacent types (per [1]), but from reading here and there it seems to be mostly compiler/implementation defined, at least not formally/strictly standard defined. So I would go with a single type for all the bits, an apr_size_t since the whole is meant to fit in a single one (and I don't see what unsigned int buys us here). [1] https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Bit-Field-Packing.html Regards; Yann.