Hi Bruno,

On Fri, Sep 19, 2025 at 03:12:52PM +0200, Bruno Haible wrote:
> Hi Alejandro,
> 
> > > [CCing sc22wg14]
> > 
> > [I've removed WG14, as I think planning outside of their sight can be
> > more productive.]
> 
> That's OK. I just wanted to make the committee aware that good naming is
> important and that the names in the current proposal are not good naming,
> IMO.

They didn't receive your mail.  The list is members-only.  But I'll
reflect this in the proposal when I present it.

> > Since gnulib would be the first implementor, and my proposal won't be
> > voted until February or March, gnulib has the chance to influence on the
> > name.
> > 
> > If you implement them as XXXprefix() and XXXsuffix(), I'll change my
> > proposal, due to prior art.
> > 
> > The committee, a priory, would refuse such common names, due to fears of
> > breaking much existing code.  However, if major existing libc
> > implementations use them, it might turn the votes.
> > 
> > If you start with strprefix()/suffix(), and then glibc may follow, we
> > have two of the major libc implementations.  musl and Bionic would
> > probably follow for compatibility.
> > 
> > But this must come from implementations.  The committee will not help.
> 
> Thanks for the insight. (I don't have much experience with WG14.)

Think of the committee as a group of people that live in an Ivory Tower
and prefer not fixing mistakes, for fears of the image that admitting
mistakes would give to users.  It's actually not a bad model.

> The major reason that speaks for the name 'str_startswith', as used
> in Gnulib, is:
>   - The term starts-with is already used for this purpose in
>     10 out of 15 programming languages. Find attached a summary from
>     the prior knowledge summarization engine.
>   - The prefix 'str' shows that it's for 'char *' strings and allows
>     a similar function with prefix 'wcs' for 'wchar_t *' wide strings.

The second point isn't exclusive of str_startswith.  stRprefix() has the
same property.

The etymology for the name stPprefix() comes from the stp* (POSIX; e.g.,
stpcpy(3)) and memp* (GNU, BSD; e.g. mempcpy(3)) families of functions,
which are string functions that return an offset pointer.

Since these return an offset pointer, the stp* prefix is quite
appropriate for stpprefix() and stpsuffix().

About names from other languages, I don't necessarily like that
reasoning.  We almost had _Countof() named _Lengthof() because committee
members wanted that name from other languages which have it as length.
That would have been harmful, as it would have mixed sizes and lengths
in string-handling code, promoting off-by-one bugs.  In general, I'd
take other languages with a pinch of salt.

Consistency with libc is important too.  If I see str_startswith(), I
would guess it's a projec-specific API.  If I see strprefix(), I guess
it's a libc API.  That's because the reserved prefix is str* followed by
lowercase.  And even without the reservation, we have no such names in
string.h, so instinct also plays a role.

> Therefore, I think we should stay with that for Gnulib. No 'strprefix'
> or such.
> 
> Then, let's look for a readable and pronounceable name for the variant
> that returns a pointer.
> 
>   - I tried thinking in terms of parsing, i.e. "str_parse_prefix",
>     but what about the one with suffix then? "str_backwardparse_suffix"?
>     Backward parsing is rarely seen in code.

That induces confusion.  Does the function reverse the string as if by
rev(1) before searching?

>   - How about the names 'str_prefix_end' and 'str_suffix_start'?
>     It's descriptive.
>     It's pronounceable.
>     The names imply that they return a pointer.
>     The names are not very long, compared to what we already find in ISO C:
> 
>       str_prefix_end
>       str_suffix_start
>       fegetexceptflag
>       fe_dec_getround
>       fmaximum_mag_num
>       decodebind128
>       atomic_compare_exchange
>       atomic_flag_test_and_set
>       atomic_flag_test_and_set_explicit
>       stdc_first_leading_zero_ull
>       memset_explicit

However, we also need to compare them to <string.h>.  They're part of
the strchr(3) and strstr(3) family of functions, as they have similar
semantics (although they also are related to streq() in some sense).
Having consistency with their family is important, IMO.

>     They are not used at all [1][2] in existing code, so not a hindrance
>     for WG14.
> 
> Bruno
> 
> [1] https://codesearch.debian.net/search?q=str_prefix_end&literal=1
> [2] https://codesearch.debian.net/search?q=str_suffix_start&literal=1

On Fri, Sep 19, 2025 at 03:14:52PM +0200, Bruno Haible wrote:
> Paul Eggert wrote:
> > Not sure I like having bool variants that are trivial wrappers for the
> > pointer variants, though. Life is already complicated enough.
>
> Unlike Paul, I don't mind having two different functions
>   - str_startswith, that returns 'bool',
>   - str_prefix_end, that returns 'char [const] *'.
> They have different purposes, That justifies the different names.
>
> Having only the function that returns the pointer and telling the
> programmers to use this function when in fact they want a 'bool'
>
>   * either leads to code like
>       if (str_prefix_end (str, p) != NULL)
>     which is longer and less expressive than
>       if (str_startswith (str, p))

In shadow utils, I ended up using implicit conversions for if()
conditionals:

        if (strchr(...))
        if (!strchr(...))

and for while() loop conditions we do this:

        while (NULL != (p = strchr(...)))

It's quite readable and relatively consistent.

>   * or introduces implicit conversions from pointer to bool
>       if (str_prefix_end (str, p))
>     which some people abhor.

On the one hand, yeah, I can agree.  On the other hand, we have
strchr(3), strpbrk(3), strstr(3), and all that family of functions.
We don't have bool-returning variants of those.  There's just a boolean-
like pointer function, and users need implicit conversions.

Since strprefix() and strsuffix() are essentially part of that same
family, consistency is an important factor.

One benefit of these dual-purpose functions is that you learn one
function only, but have both funtionalities.  If there were two
variants, and the one returning an offset pointer would be stp*, then
the similar names would let programmers just remember that stp* is the
one with an offset pointer, and the other one is the basic one.  But
entirely different names are cognitively problematic.

Here I agree with Paul, and prefer a single API.

I am not sure if I would have preferred libc to have separated the
strchr(3) and strstr(3) family of functions into bool-returning str*
functions are offset-pointer-returning as stp* variants.  But given the
status quo, I'd slightly prefer following it.

Projects that don't like <string.h> conventions can write their own
wrappers that perform bool conversions.


-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

Attachment: signature.asc
Description: PGP signature

Reply via email to