Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Aaron Ballman Thu, 24 Jul 2025 07:08:57 -0700

On Wed, Jul 23, 2025 at 8:38 PM Martin Uecker <ma.uec...@gmail.com> wrote:
>
> Am Mittwoch, dem 23.07.2025 um 11:53 +0000 schrieb Aaron Ballman:
> > On Wed, Jul 23, 2025 at 5:47 AM Martin Uecker <ma.uec...@gmail.com> wrote:
> > >
> >
> > This is a personal stance of mine, not a Clang community response.
> >
> >
> > > But this requires true collaboration, which can not
> > > exist when one side is not able to compromise. What
> > > happens next time there is a disagreement?  Will clang
> > > again try to force its decision on the rest of us?
> >
> > True collaboration goes two ways and the stream of acerbic, unhelpful
> > accusations like this destroy a lot of people's interest in wanting to
> > help find a solution here. The word "toxic" has come up around this
> > topic within the Clang community and I don't blame people for walking
> > away (if I wasn't lead maintainer, I'd have done so despite my
> > personal interest in seeing C become a safer language). If you're
> > interested in working together across communities, maybe don't
> > continue to post these kinds of unconstructive comments?
>
> Ok, sorry, if I misunderstood the clang position. If there
> is indeed room for a compromise, then let's try to work on
> a compromise.


Again, personal thoughts, not Clang community response (I really wish
I had a better mechanism for distinguishing between the two, lol).

I think there's room for compromise (at least I reallllllly hope there
is). The stated Clang community position is:

late parsing: has consensus from the original bounds safety RFC
gcc forward declared parameters: has consensus against
all other options: neither consensus for nor consensus against

And if new information arrives on late parsing or forward declared
parameters, consensus can change (but changing consensus is usually a
hurdle, like in any sort of committee).

> > That said, John McCall pointed out some usage patterns Apple has with
> > their existing feature:
> >
> > * 655 simple references to variables or struct members: __counted_by(len)
> > * 73 dereferences of variables or struct members: __counted_by(*lenp)
> > * 80 integer literals: __counted_by(8)
> > * 60 macro references: __counted_by(NUM_EIGHT) [1]
> > * 9 simple sizeof expressions: __counted_by(sizeof(eight_bytes_t))
> > * 28 others my script couldn’t categorize:
> >   * 7 more complicated integer constant expressions:
> > __counted_by(num_bytes_for_bits(NUM_FIFTY_SEVEN)) [2]
> >   * 16 arithmetically-adjusted references to a single variable or
> > struct member: __counted_by(2 * len + 8)
> >   * 1 nested struct member: __counted_by(header.len)
> >   * 4 combinations of struct members: __counted_by(len + cnt) [3]
> >
> > Do the Linux kernel folks think this looks somewhat like what their
> > usage patterns will be as well? If so, I'd like to argue for my
> > personal stake in the ground: we don't need any new language features
> > to solve this problem, we can use the existing facilities to do so and
> > downscope the initial feature set until a better solution comes along
> > for forward references. Use two attributes: counted_by (whose argument
> > specifies an already in-scope identifier of what holds the count) and
> > counts (whose argument specifies an already in-scope identifier of
> > what it counts). e.g.,
> > ```
> > struct S {
> >   char *start_buffer;
> >   int start_len __counts(start_buffer);
> >   int end_len;
> >   char *end_buffer __counted_by(end_len);
> > };
> >
> > void func(char *buffer, int N __counts(buffer), int M, char *buffer
> > __counted_by(M));
> > ```
> > It's kind of gross to need two attributes to do the same notional
> > thing, but it does solve the vast majority of the usages seen in the
> > wild if you're willing to accept some awkwardness around things like:
> > ```
> > struct S {
> >   char *buffer;
> >   int *len __counts(buffer); // Note that len is a pointer
> > };
> > ```
> > because we'd need the semantics of `counts` to include dereferencing
> > to the `int` in order to be a valid count. We'd be sacrificing the
> > ability to handle the "others my script couldn't categorize", but
> > that's 28 out of the 905 total cases and maybe that's acceptable?
>
> So what do you think about the solution Qing mentioned:
>
> struct {
>   char *buf __counted_by_expr(int len; len + 7);
>   int len;
> };
>
> which would be very flexible and support all possible use cases
> and has no parsing or semantic interpretation issues.

Personally, I'm not excited by it because one of the big sticking
points in the Clang community is shared header files with C++. Because
these attributes are used on structures and functions, the two most
common things you'll find in a shared header file, we *really* want
the feature to be workable in both languages to the greatest extent
possible. And once we care about C++, things get so much harder due to
the extra complexity it brings. So, for example, we'd have to figure
out how to handle things like:
```
template <typename Ty>
struct S {
  char *buffer __counted_by_expr(Ty len; len + 7);
  int len; // Oooooops
};

template <typename Ty, typename Uy>
struct T {
  char *buffer __counted_by_expr(Ty len; len + 7);
  Uy len; // Grrrrr
};
```
I think it's possible to handle these situations, but we'd have to sit
down and think through all the edge cases and whether we can handle
them all with some reasonable QoI. I think we'd ultimately run into
the same concerns here as we ran into with forward declared
parameters. I think the reason folks in Clang are more comfortable
with late parsing is because it means the user doesn't have to repeat
the type and the name which makes for less chances for the user to
screw things up and get into these weird situations. There can be
other weird situations with late parsing too, of course, but I think
the scope of those edge cases is a bit narrower.

The other downside is that we have more attributes that need to
support something similar, like the thread safety attributes (which I
believe is also an important use case for the Linux kernel folks?). We
could do this dance on a per-attribute basis, but if the solution
worked for all attributes *and* array extents at the same time, that
would be nice. Not certain it's a requirement though.

> The the thing is that WG14 had (weak) consensus for parameter
> forward declarations and  I think more consensus for [.N]
> syntax in structures already.  So I had hoped that we will be
> able to make progress on this.

Question on the .N syntax: I thought I heard that this was something
GCC could handle, but that it still requires late parsing to ensure
type information for N is available and that was a problem. e.g.,

void func(char *buffer __counted_by(.N * sizeof(.N)), int N);

where we'd need to know both the name and the type. Am I wrong about
that being a challenge for GCC to support? If so, I think it may be
plausible in Clang (implementation-wise, if we can handle late parsing
without the dot, it sure seems like handling it with the dot won't be
any harder). Whether the community will go for it or not, I'm not
certain, but if GCC can support it, I can try to sell it to Clang
folks as a good compromise.

~Aaron

Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Reply via email to