I posted this on the LLVM Discourse forum[1] and got some traction, so I want to get the GCC community's input. (My initial proposal is replicated here.)
I had already mentioned this in previous emails in this thread, so it's nothing super new, and there have been some suggested improvements already. Parts of this reference a meeting that took place between the LLVM developers and some non-LLVM developers. The meeting mostly explained the issues regarding the "compromise" from this thread and how it interacts (poorly) with C++, and vice versa. There was a lengthy discussion after this proposal. Please take a look and let me know what you think. -bw [1] https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885/32?u=void -------------------------------------------------- I’ve been putting off pushing this proposal, because it is a departure from what Apple has done and added a lot of extra syntax for this feature, but I think it’s appropriate right now. The main issue at play is that C and C++ are two very different languages. The scoping rules are completely different making name resolution not work in one language without jumping through non-obvious hoops. This was made clear in @rapidsna’s presentation last week. Making matters worse is that GCC (and other) compilers perform one pass parsing for C, making forward declarations necessary. The forward declarations, while solving many issues, have their own issues. Other solutions at play require changes to the base languages, which require approval by the standards committee. Even if the full struct was declared before the expression in the attribute was defined, there would still be issues, due to one example from @rapidsna’s presentation [as pointed out by Joseph Jelinek]: typedef int T; struct foo { int T; int U; int * __counted_by_expr(int T; (T)+U) buf; // Addition or cast? }; Given this, I want to propose using functions / static methods for expressions. The function takes one and only one argument: a "this" pointer to the least enclosing non-anonymous struct. The call to the function is generated by the compiler, so no argument the attribute only needs to indicate the function’s name. This avoids the need to add a new __builtin_* or __self element to C. * The function needs to be declared before use in C. (It can be fully defined if no fields within the struct are used.) * The function should be static and marked as pure (and maybe always_inline). * The function in C++ should be private or protected. C example: static size_t calc_counted_by(void *); struct foo { /* ... */ struct bar { int * __counted_by_expr(calc_counted_by) buf; int count; int scale; }; }; enum { OFFSET = 42 }; // The function could be marked with the 'pure' attribute. static size_t __pure calc_counted_by(void *p) { struct bar *ptr = (struct foo *)p; return ptr->count * ptr->scale - OFFSET; } C++ example: struct foo { enum { OFFSET = 42 }; struct bar { int * __counted_by_expr(calc_counted_by) buf; private: static size_t __pure calc_counted_by(struct bar *ptr) { return ptr->count * ptr->scale - OFFSET; } public: int count; int scale; }; }; Pros 1. This uses the current language without any modifications to scoping or requiring feature additions that need to be approved by the standards committee. All compilers should be able to implement them without major modifications. 2. Name lookup is no longer a problem, so there isn’t a need for forward declarations or trying to determine which scope to use in various circumstances. 3. In the general case where the full struct is pass into the calculating function, both C and C++ parse the code in the same way. In the C example above, it would need to be modified to this: static size_t __pure calc_counted_by(void *p) { #ifdef __cplusplus foo::bar *ptr = static_cast<foo::bar *>(p); #else struct bar *ptr = (struct bar *)p; #endif return ptr->count * ptr->scale - OFFSET; } This format can be extended to other languages if need be. Cons 1 It’s wordy, which may make it unappealing to users. 2 The #ifdef __cplusplus ... #endif usage above is wordy and a bit awkward. 3 Importantly, it’s harder for Apple’s bounds safety work to analyze the fields used within the expression. 4. Apple and their users already use the current syntax. For (1), that’s an unfortunate outcome of this feature. There may be ways to reduce the amount of code that needs to be written, but the above is a good starting place. [Note: Kees came up with a way to avoid the forward declaration of the function---have the compiler generate the forward declaration with a set declaration syntax: e.g. static __pure size_t size_calculation(struct foo *);] For (2), the rule about using the least enclosing non-anonymous struct could be loosened and the whole struct passed in. The user has full control over which fields to use. For (3), it’s harder to get the expression because it’s within the function, but that function is available in the AST, so getting its contents shouldn’t be impossible. (I don’t mean to shrug off this concern as I haven’t seen the code. If I’m completely off base here please tell me.) For (4), this is a large sticking point. There are two options that I can think of: 1. Allow Apple users to keep the current syntax, because Apple’s platform doesn’t support GCC, and/or 2. Use clang-tidy to convert the old syntax to the new syntax. I don’t think either option is better than the other, though (1) does involve supporting two different code paths for the same feature. In conclusion My overriding concern from the beginning is that both GCC and Clang end up with the same (or similar) syntax for these features so that it can be applied equally to Linux (and one assumes other projects). None of the suggested syntaxes or solutions presented so far satisfy all requirements. Usage of a function to calculate the size uses the base language features, doesn’t require changing any language, doesn’t require support from a standards committee, and can be supported by both compilers (I even have a branch that implements a simplified version for Clang). Share and enjoy! -bw