On Wed, Jun 25, 2025 at 1:37 PM Bill Wendling <isanb...@gmail.com> wrote:
>
> I posted this on the LLVM Discourse forum[1] and got some traction, so
> I want to get the GCC community's input. (My initial proposal is
> replicated here.)
>
> I had already mentioned this in previous emails in this thread, so
> it's nothing super new, and there have been some suggested
> improvements already. Parts of this reference a meeting that took
> place between the LLVM developers and some non-LLVM developers. The
> meeting mostly explained the issues regarding the "compromise" from
> this thread and how it interacts (poorly) with C++, and vice versa.
>
> There was a lengthy discussion after this proposal.
>
> Please take a look and let me know what you think.
>
There are a couple of notes to add to this proposal:

1. The suggestion to omit the forward decl of the function is more of
a nicety rather than a requirement. I could be overstating how much it
would annoy a programmer to have to write:

struct b;
static __pure size_t calc(struct b *);

struct b {
  int *buf __counted_by(calc);
  int count;
};

static __pure size_t calc(struct b *p) {
  // do something.
}

rather than have the compiler do the forward decls for you.

2. There was another suggestion on the mailing list to add the
attribute after the struct definition:

enum { OFFSET = 42 };
struct foo {
  int count;
  int *buf;
} __counted_by(count - OFFSET, buf);

It has some merit. The downside is that it loses locality.

-bw

> -bw
>
> [1] 
> https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885/32?u=void
>
> --------------------------------------------------
>
> I’ve been putting off pushing this proposal, because it is a departure
> from what Apple has done and added a lot of extra syntax for this
> feature, but I think it’s appropriate right now.
>
> The main issue at play is that C and C++ are two very different
> languages. The scoping rules are completely different making name
> resolution not work in one language without jumping through
> non-obvious hoops. This was made clear in @rapidsna’s presentation
> last week. Making matters worse is that GCC (and other) compilers
> perform one pass parsing for C, making forward declarations necessary.
> The forward declarations, while solving many issues, have their own
> issues. Other solutions at play require changes to the base languages,
> which require approval by the standards committee.
>
> Even if the full struct was declared before the expression in the
> attribute was defined, there would still be issues, due to one example
> from @rapidsna’s presentation [as pointed out by Joseph Jelinek]:
>
> typedef int T;
> struct foo {
>   int T;
>   int U;
>   int * __counted_by_expr(int T; (T)+U) buf; // Addition or cast?
> };
>
> Given this, I want to propose using functions / static methods for 
> expressions.
>
> The function takes one and only one argument: a "this" pointer to the
> least enclosing non-anonymous struct.
>
> The call to the function is generated by the compiler, so no argument
> the attribute only needs to indicate the function’s name. This avoids
> the need to add a new __builtin_* or __self element to C.
>
> * The function needs to be declared before use in C. (It can be fully
> defined if no fields within the struct are used.)
> * The function should be static and marked as pure (and maybe always_inline).
> * The function in C++ should be private or protected.
>
> C example:
>
> static size_t calc_counted_by(void *);
> struct foo {
>   /* ... */
>   struct bar {
>     int * __counted_by_expr(calc_counted_by) buf;
>     int count;
>     int scale;
>   };
> };
>
> enum { OFFSET = 42 };
>
> // The function could be marked with the 'pure' attribute.
> static size_t __pure calc_counted_by(void *p) {
>   struct bar *ptr = (struct foo *)p;
>   return ptr->count * ptr->scale - OFFSET;
> }
>
> C++ example:
>
> struct foo {
>   enum { OFFSET = 42 };
>   struct bar {
>     int * __counted_by_expr(calc_counted_by) buf;
>   private:
>     static size_t __pure calc_counted_by(struct bar *ptr) {
>       return ptr->count * ptr->scale - OFFSET;
>     }
>   public:
>     int count;
>     int scale;
>   };
> };
>
> Pros
>
> 1. This uses the current language without any modifications to scoping
> or requiring feature additions that need to be approved by the
> standards committee. All compilers should be able to implement them
> without major modifications.
> 2. Name lookup is no longer a problem, so there isn’t a need for
> forward declarations or trying to determine which scope to use in
> various circumstances.
> 3. In the general case where the full struct is pass into the
> calculating function, both C and C++ parse the code in the same way.
> In the C example above, it would need to be modified to this:
>
> static size_t __pure calc_counted_by(void *p) {
> #ifdef __cplusplus
>   foo::bar *ptr = static_cast<foo::bar *>(p);
> #else
>   struct bar *ptr = (struct bar *)p;
> #endif
>   return ptr->count * ptr->scale - OFFSET;
> }
>
> This format can be extended to other languages if need be.
>
> Cons
>
> 1 It’s wordy, which may make it unappealing to users.
> 2 The #ifdef __cplusplus ... #endif usage above is wordy and a bit awkward.
> 3 Importantly, it’s harder for Apple’s bounds safety work to analyze
> the fields used within the expression.
> 4. Apple and their users already use the current syntax.
>
> For (1), that’s an unfortunate outcome of this feature. There may be
> ways to reduce the amount of code that needs to be written, but the
> above is a good starting place.
>
> [Note: Kees came up with a way to avoid the forward declaration of the
> function---have the compiler generate the forward declaration with a
> set declaration syntax: e.g. static __pure size_t
> size_calculation(struct foo *);]
>
> For (2), the rule about using the least enclosing non-anonymous struct
> could be loosened and the whole struct passed in. The user has full
> control over which fields to use.
>
> For (3), it’s harder to get the expression because it’s within the
> function, but that function is available in the AST, so getting its
> contents shouldn’t be impossible. (I don’t mean to shrug off this
> concern as I haven’t seen the code. If I’m completely off base here
> please tell me.)
>
> For (4), this is a large sticking point. There are two options that I
> can think of:
>
> 1. Allow Apple users to keep the current syntax, because Apple’s
> platform doesn’t support GCC, and/or
> 2. Use clang-tidy to convert the old syntax to the new syntax.
>
> I don’t think either option is better than the other, though (1) does
> involve supporting two different code paths for the same feature.
>
> In conclusion
>
> My overriding concern from the beginning is that both GCC and Clang
> end up with the same (or similar) syntax for these features so that it
> can be applied equally to Linux (and one assumes other projects). None
> of the suggested syntaxes or solutions presented so far satisfy all
> requirements.
>
> Usage of a function to calculate the size uses the base language
> features, doesn’t require changing any language, doesn’t require
> support from a standards committee, and can be supported by both
> compilers (I even have a branch that implements a simplified version
> for Clang).
>
> Share and enjoy!
> -bw

Reply via email to