Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Yeoul Na Wed, 30 Jul 2025 09:06:43 -0700


> On Jul 29, 2025, at 7:48 PM, Bill Wendling <mo...@google.com> wrote:
> 
> On Mon, Jul 28, 2025 at 11:36 PM Martin Uecker <ma.uec...@gmail.com 
> <mailto:ma.uec...@gmail.com>> wrote:
>> 
>> Am Montag, dem 28.07.2025 um 17:45 -0700 schrieb Bill Wendling:
>>> On Mon, Jul 28, 2025 at 4:29 PM Martin Uecker <ma.uec...@gmail.com> wrote:
>>>> Am Montag, dem 28.07.2025 um 16:01 -0700 schrieb Bill Wendling:
>>>>> On Mon, Jul 28, 2025 at 2:39 PM Martin Uecker <ma.uec...@gmail.com> wrote:
>>>>>> Yes, forwards declarations are this simplest solution.
>>>>>> 
>>>>> Forward declarations work until you get something complex. For
>>>>> example, if we want to support substructure fields in the attribute,
>>>>> you'd have to replicate the whole substructure's declaration in the
>>>>> forward decl. That becomes unwieldy when the substructure is very big:
>>>>> 
>>>>> struct foo {
>>>>>    char *buf __counted_by_expr(struct bar { ... } sub; sub.a.b);
>>>>>    struct bar {
>>>>>        int x, y;
>>>>>        struct baz {
>>>>>            int b;
>>>>>            /* 20 other elements */
>>>>>        } a;
>>>>>        unsigned b1 : 1;
>>>>>        unsigned b2 : 4;
>>>>>        unsigned : 8;
>>>>>        unsigned b3 : 2;
>>>>>        /* and on and on... */
>>>>>    } sub;
>>>>> };
>>>> 
>>>> You could write it like this:
>>>> 
>>>> struct bar {
>>>>        int x, y;
>>>>        struct baz {
>>>>            int b;
>>>>            /* 20 other elements */
>>>>        } a;
>>>>        unsigned b1 : 1;
>>>>        unsigned b2 : 4;
>>>>        unsigned : 8;
>>>>        unsigned b3 : 2;
>>>>        /* and on and on... */
>>>> };
>>>> 
>>>> struct foo {
>>>>    char *buf __counted_by_expr(struct bar sub; sub.a.b);
>>>>    struct bar sub;
>>>> };
>>> 
>>> One could always rewrite software to fit some random feature, the
>>> question is whether they're going to want to do that.
>>> 
>> Rewriting is a bit exaggerated, but yes, it would make it sligthly
>> more effort to use it.  My point is that it does not require
>> duplicating the structure in the attribute.
>> 
>> 
>>> Or even if it's
>>> feasible to do so. The idea is to make this feature as easy to use as
>>> possible to add to existing code. Both Clang and GCC have the ability
>>> to delay parsing the attribute until the struct has finished parsing.
>>> You've offered no argument against that except for some vague worries
>>> about "context". While context is definitely important to parsing, the
>>> expected expressions are a strictly defined subset of generalized
>>> expressions. Parsing an affine equation doesn't require a ton of
>>> context, except for resolved types in the case of delayed parsing.
>> 
>> You are right, if you restrict it enough you do not have too much
>> issues with context.  You just have to add a lot of constraints
>> to the parser to restrict it to this sublanguage
>> Whether this is then actually easier to maintain than have just
>> having a small parser for the sublanguage is not obvious to me.
>> 
>> I read your paragraph about the sublanguage. I assume it is
>> this one from Friday?  (If not, I did miss it)
>> 
>> -----snip-------
>> I'm not convinced that it's that complex. We have total control over
>> which expressions are valid and which aren't. We would basically be
>> allowing: addition; subtraction; multiplication; division; some
>> language features that are resolved by the front-end, like "sizeof()",
>> "offsetof()", etc.; and a limited number of function calls (maybe
>> requiring the function to be __pure or __const, but which themselves
>> should be "simple" and always inlined). We want to do this because it
>> makes tracking changes to fields that could violate bounds safety
>> checking rules much easier.
>> -----snip------
>> 
> I apologize for saying that you didn't read what I wrote.
> 
>> This does not seem to include all features Kees and John mentioned,
>> and it wasn't also clear at the beginning of the discussion how
>> much you want to constrain the language.  It is still not fully clear
>> to me.
>> 
>> I think working on specifying the exact sublanguage would be more
>> helpful than demonstrating how you can delay-parse a single identifier.
>> 
> I asked on the Discourse thread and @hnrklssn responded with this [1]:
> 
> ---quote block---
> We currently allow:
> 
> * Decl refs
>  * must be declared in the same exact scope as the counted pointer,
> or be constants
>  * for flexible array members with `counted_by`, nested struct lookup
> is allowed within the “self” struct (`header.len`). This is not
> allowed for pointers with `counted_by`.
> * one level of dereference, but only if it’s a function parameter
> * arithmetic, bitwise and logical binary operators
> * calls to functions with `__attribute__((const))`
> * integer literals or expressions that the frontend constant folds to
> an integer constant (e.g. `sizeof`, `offsetof`)
> * integer casts


FYI, these are what we currently support. However, "explicit integer casts" are 
not a requirement, so we can potentially remove it from this list if necessary. 
Then, we can probably reduce it to be a context-free subset if that makes 
things easier.

> 
> We never allow:
> 
> * assignments
> * `->`
> * increment/decrement operators
> * statement expressions
> * new declarations
> 
> I would say “simple” expressions are simple because they are simple
> for the compiler to reason about. Side effects, differences in
> lifetime, and aliasing are the main things we want to prevent. Control
> flow is also nice to avoid, although we might consider supporting the
> ternary operator in the future. Does that answer your question?
> ---quote block---
> 
> I think it's a very reasonable subset of generalized expressions. I
> didn't summarize it well enough. I'm sure that if it's not sufficient
> for our use, we could expand it to include other elements, but it
> would be a very high bar.
> 
> This leaves the problem of scoping, i.e. either using the dot-notation
> or __self identifier. I'm not against either of them, though I have a
> preference for the dot-notation. (It would involve modifying the
> parser to accept expressions with dot-identifier combos in them when
> parsing these attribute, but that shouldn't be too difficult to do.)
> 
> -bw
> 
> [1] 
> https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885/87?u=void

Re: [PATCH] [RFC] Delayed parsing for bounds safety attributes

Reply via email to