Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Qing Zhao Tue, 29 Jul 2025 06:49:24 -0700


> On Jul 28, 2025, at 17:39, Martin Uecker <ma.uec...@gmail.com> wrote:
> 
> Am Montag, dem 28.07.2025 um 20:48 +0000 schrieb Qing Zhao:
>> 
>>> On Jul 28, 2025, at 16:09, Martin Uecker <ma.uec...@gmail.com> wrote:
>>> 
>>> Am Montag, dem 28.07.2025 um 11:18 -0700 schrieb Yeoul Na:
>>>> 
>>>> 
>>>>> On Jul 28, 2025, at 10:27 AM, Qing Zhao <qing.z...@oracle.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 26, 2025, at 12:43, Yeoul Na <yeoul...@apple.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jul 24, 2025, at 3:52 PM, Kees Cook <k...@kernel.org> wrote:
>>>>>>> 
>>>>>>> On Thu, Jul 24, 2025 at 04:26:12PM +0000, Aaron Ballman wrote:
>>>>>>>> Ah, apologies, I wasn't clear. My thinking is: we're (Clang folks)
>>>>>>>> going to want it to work in C++ mode because of shared headers. If it
>>>>>>>> works in C++ mode, then we have to figure out what it means with all
>>>>>>>> the various C++ features that are possible, not just the use cases
>>>>>>> 
>>>>>>> I am most familiar with C, so I may be missing something here, but if
>>>>>>> -fbounds-safety is intended to be C only, then why not just make it
>>>>>>> unrecognized in C++?
>>>>>> 
>>>>>> The bounds safety annotations must also be parsable in C++. While C++ 
>>>>>> can get bounds checking by using std::span instead of raw pointers, 
>>>>>> switching to std::span breaks ABI. Therefore,
>>>>>> in many situations, C++ code must continue to use raw pointers—for 
>>>>>> example, when interoperating with C code by sharing headers with C. In 
>>>>>> such cases, bounds annotations can help close
>>>>>> safety gaps in raw pointers.
>>>>> 
>>>>> -fbound-safety feature was initially proposed as an C extension, So, it’s 
>>>>> natural to make it compatible with C language, not C++. 
>>>>> If C++ also need such a feature, then an extension to C++ is needed too.
>>>>> If a consistent syntax for this feature can satisfy both C and C++,  that 
>>>>> will be ideal.
>>>>> However, if  providing such consistent syntax requires major changes to C 
>>>>> language, 
>>>>> ( a new name lookup scope, and late parsing), it might be a good idea to 
>>>>> provide different syntax for C and C++.
>>>> 
>>>> 
>>>> So the main problem here is when the "same code” will be parsed in both in 
>>>> C and C++, which is quite common in practice.
>>>> 
>>>> Therefore, we need a way to reasonably write code that works both C and 
>>>> C++. 
>>>> 
>>>> From my perspective, that means:
>>>> 
>>>> 1. The same spelling doesn’t “silently" behave differently in C and C++.
>>>> 2. At least the most common use cases (i.e., __counted_by(peer)) should be 
>>>> able to be written the same way in C and C++, without ceremony.
>>>> 
>>>> Here is our compromise proposal that meets these requirements, until we 
>>>> get blessing from the standard for a more elegant solution:
>>>> 
>>>> 1. `__counted_by(member)` keeps working as is: late parsing + name lookup 
>>>> finds the member name first
>>>> 2. `__counted_by_expr(expr)` uses a new syntax (e.g., __self), and is not 
>>>> allowed to use a name that matches the member name without the new syntax 
>>>> even if that would’ve resolved to a
>>>> global variable. Use something like  `__global_ref(id)` to disambiguate. 
>>>> This rule will prevent the confusion where `__counted_by_expr(id)` and 
>>>> `__counted_by(id)` may designate different
>>>> entities.
>>>> 
>>>> Here are the examples:
>>>> 
>>>> Ex 1)
>>>> constexpr int n = 10;
>>>> 
>>>> struct s {
>>>>  int *__counted_by(n) ptr; // resolves to member `n`; which matches the 
>>>> current behavior 
>>>>  int n;
>>>> };
>>>> 
>>>> Ex 2)
>>>> constexpr int n = 10;
>>>> struct s {
>>>>  int *__counted_by_expr(n) ptr; // error: referring to a member name 
>>>> without “__self."
>>>>  int n;
>>>> };
>>>> 
>>>> Ex 3)
>>>> constexpr int n = 10;
>>>> struct s {
>>>>  int *__counted_by_expr(__self.n) ptr; // resolves to member `n`
>>>>  int n;
>>>> };
>>>> 
>>>> 
>>>> Ex 4)
>>>> constexpr int n = 10;
>>>> struct s {
>>>>  int *__counted_by_expr(__self.n + 1) ptr; // resolves to member `n`
>>>>  int n;
>>>> };
>>>> 
>>>> 
>>>> Ex 5)
>>>> constexpr int n = 10;
>>>> struct s {
>>>>  int *__counted_by_expr(__global_ref(n) + 1) ptr; // resolves to global `n`
>>>>  int n;
>>>> };
>>>> 
>>>> 
>>>> Ex 6)
>>>> constexpr int n = 10;
>>>> struct s {
>>>>  int *__counted_by_expr(n + 1) ptr; // resolves to global `n`; okay, no 
>>>> matching member name
>>>> };
>>>> 
>>>> Or in case, people prefer forward declaration inside 
>>>> `__counted_by_expr()`, the similar rule can apply to achieve the same goal.
>>>> 
>>> 
>>> Thank you Yeoul! 
>>> 
>>> I think it is a reasonable compromise.
>> 
>> Yes, I agree. -:)
>> 
>> It adds two new keywords in both C and C++ (__self and __global_ref) to 
>> explicitly mark the scopes for the variables inside the attribute. 
>> will definitely resolve the lookup scope ambiguity issue in both C and C++. 
>> 
>> However, it will not resolve the issue when the counted_by field is declared 
>> After the pointer field. 
>> So, forward declarations is still  needed to resolve this issue, I think.
> 
> Yes, forwards declarations are this simplest solution.
> 
> 
> Another idea I mentioned before is to let __self.N have type 
> int, and then emit an error later if it has  a type that 
> would change the type / meaning of the immediate
> parent expression.


Yes, this is reasonable too.

However, one of the major issue with it is, the user has to change the type 
of  all the counted_by field to “int”. Not sure whether this is easy to do or 
not
for  a large application. 

Qing
> 
> This would allow all of the following:
> 
> struct foo { 
> char * __counted_by_expr(__self.N) buf;
> int N;
> };
> struct foo {
> char * __counted_by_expr(__self.N + 1L) buf;
> long N;
> };
> struct foo {
> char * __counted_by_expr(__self.N * 2) buf;
> int N;
> };
> struct foo {
> char * __counted_by_expr(__self.N + 2) buf;
> char N;
> };
> struct foo {
> char * __counted_by_expr(__self.N + .M) buf;
> int N; int M;
> };
> struct foo {
> char * __counted_by_expr((int)__self.N) buf;
> double N;
> };
> struct foo {
> char * __counted_by_expr(3 * sizeof(__self.buf2)) buf;
> char buf2[5];
> };
> struct foo {
> char * __counted_by_expr(((struct bar *)__self.x)->z) buf; 
> struct bar *x;
> };
> 
> 
> It would *not* allow:
> 
> struct foo {
> char * __counted_by_expr(__self.N + 1) buf;
> long N;
> };
> struct foo {
> char * __counted_by_expr(__self.x->z) buf;
> struct foo *x;
> };
> 
> 
> But in this case you would get an explicit error:
> 
> xyz:13.4: Type of `__self.N' needs to be known.  Did you forget to
> add a cast `(long)__self.N'?
> 
> 
> 
> Martin
> 
> 
> 
> 
> 
>

Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Reply via email to