+josmy...@redhat.com
On Sun, Jul 20, 2025 at 4:20 AM <mo...@google.com> wrote: > > From: Bill Wendling <mo...@google.com> > > Also, this code doesn't go further than parsing. I.e., it doesn't generate the > internal gimple code that accesses the struct fields. The code is meant to > show > that it *is* possible to perform a delayed parsing with no "double parsing" > and > still be performant. > > Minor Nomenclature Change > ------------------------- > > I (try to) use "bounds safety" instead of "counted_by" in the code. This is > because Clang has many more bounds safety attributes in the pipeline and > "bounds safety" seems like a better overall name for the set of attributes. > > Token-Based Delayed Parsing > --------------------------- > > The solution introduces a token capture and delayed parsing mechanism that > stores tokens during initial parsing and performs the actual parse later when > the struct context is available. This isn't a novel idea, as the OMP code does > something similar for directives. > > After storing the tokens, the parsing continues as if the expression was > parsed > (but it hasn't been yet). The parsing is delayed until the > 'verify_counted_by_attribute ()' call, which performs the actual parse and > reports any semantic errors. (The actual parse is done simply by creating a > new > 'c_parser' object and filling it with the delayed tokens.) > > On a successful parse, the 'C_TOKEN_VEC' tree node is converted into a tree > node holding an expression. The token vector is freed afterwards. > > If this approach is deemed worthy, I would like to incorporate the "single > identifier" parsing into the delayed parsing mechanism (a single identifier is > just a simple expression). > > RFC > --- > > - Is this something that GCC is interested in to get us past this hurdle > that's delaying the upstreaming of more bounds safety checks? > - Will this solve the issues at hand? (For this implementation, I'm > specifically focusing struct field attributes. I haven't thought much about > parameter list attributes.) > - I think there's a layering violation. The 'c-decl.cc' "verifies" the > counted_by argument, but it's in essence performing an identifier > resolution, > which is something the parser should be doing. Perhaps that code should be > moved into "finish_struct ()", or somewhere else in "c/c-parser.cc"? > - My GCC-fu isn't strong and I may have abused some parts of the parser in > unholy ways. (I blame beer.) > - Anything else you'd like to say or recommend. > > Disclaimer > ---------- > > No AI's were harmed in the making of this RFC, though one did have its > feelings > hurt. > > Share and enjoy! > -bw > > > [1] > https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885 > > --- > To: gcc-patches@gcc.gnu.org > Cc: Kees Cook <k...@kernel.org> > Cc: Nick Desaulniers <nick.desaulni...@gmail.com> > Cc: Martin Uecker <uec...@tugraz.at> > Cc: Qing Zhao <qing.z...@oracle.com> > Cc: Jakub Jelinek <ja...@redhat.com> > Cc: Richard Biener <richard.guent...@gmail.com> > Cc: Yeoul Na <yeoul...@apple.com> > Cc: John McCall <rjmcc...@apple.com> > Cc: Aaron Ballman <aa...@aaronballman.com> > Cc: Justin Stitt <justinst...@google.com> > > v1: - Initial RFC submitted. > - Kees Cook's Linus asbestos suit borrowed. > --- > gcc/attribs.cc | 8 ++- > gcc/c-family/c-attribs.cc | 37 +++++++++--- > gcc/c/c-decl.cc | 97 ++++++++++++++++++++++-------- > gcc/c/c-parser.cc | 123 ++++++++++++++++++++++++++++++++++++-- > gcc/c/c-parser.h | 1 + > gcc/c/c-typeck.cc | 21 +++++++ > 6 files changed, 246 insertions(+), 41 deletions(-) > > diff --git a/gcc/attribs.cc b/gcc/attribs.cc > index 3fce9d625256..74001889c72a 100644 > --- a/gcc/attribs.cc > +++ b/gcc/attribs.cc > @@ -752,8 +752,10 @@ decl_attributes (tree *node, tree attributes, int flags, > } > else > { > - int nargs = list_length (args); > - if (nargs < spec->min_length > + int nargs = args && TREE_CODE (args) == C_TOKEN_VEC > + ? 1 > + : list_length (args); > + if (nargs < spec->min_length > || (spec->max_length >= 0 > && nargs > spec->max_length)) > { > @@ -771,7 +773,7 @@ decl_attributes (tree *node, tree attributes, int flags, > spec->min_length, spec->max_length, nargs); > continue; > } > - } > + } > gcc_assert (is_attribute_p (spec->name, name)); > > if (spec->decl_required && !DECL_P (*anode)) > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc > index 1f4a0df12051..0e8c17440b9a 100644 > --- a/gcc/c-family/c-attribs.cc > +++ b/gcc/c-family/c-attribs.cc > @@ -2887,7 +2887,14 @@ handle_counted_by_attribute (tree *node, tree name, > bool *no_add_attrs) > { > tree decl = *node; > - tree argval = TREE_VALUE (args); > + tree argval; > + > + /* Handle both C_TOKEN_VEC and regular argument formats. */ > + if (TREE_CODE (args) == C_TOKEN_VEC) > + argval = args; > + else > + argval = TREE_VALUE (args); > + > tree old_counted_by = lookup_attribute ("counted_by", DECL_ATTRIBUTES > (decl)); > > /* This attribute is not supported in C++. */ > @@ -2922,11 +2929,13 @@ handle_counted_by_attribute (tree *node, tree name, > " array member field", name); > *no_add_attrs = true; > } > - /* The argument should be an identifier. */ > - else if (TREE_CODE (argval) != IDENTIFIER_NODE) > + /* The argument should be an identifier or a C_TOKEN_VEC for complex > + expressions. */ > + else if (TREE_CODE (argval) != IDENTIFIER_NODE > + && TREE_CODE (argval) != C_TOKEN_VEC) > { > error_at (DECL_SOURCE_LOCATION (decl), > - "%<counted_by%> argument is not an identifier"); > + "%<counted_by%> argument is not an identifier or > expression"); > *no_add_attrs = true; > } > /* Issue error when there is a counted_by attribute with a different > @@ -2934,14 +2943,28 @@ handle_counted_by_attribute (tree *node, tree name, > else if (old_counted_by != NULL_TREE) > { > tree old_fieldname = TREE_VALUE (TREE_VALUE (old_counted_by)); > - if (strcmp (IDENTIFIER_POINTER (old_fieldname), > - IDENTIFIER_POINTER (argval)) != 0) > - { > + /* Only check conflicts for simple identifiers; complex expressions > + will be validated during struct completion. */ > + if (TREE_CODE (argval) == IDENTIFIER_NODE > + && TREE_CODE (old_fieldname) == IDENTIFIER_NODE > + && strcmp (IDENTIFIER_POINTER (old_fieldname), > + IDENTIFIER_POINTER (argval)) > + != 0) > + { > error_at (DECL_SOURCE_LOCATION (decl), > "%<counted_by%> argument %qE conflicts with" > " previous declaration %qE", argval, old_fieldname); > *no_add_attrs = true; > } > + else if (TREE_CODE (argval) == C_TOKEN_VEC > + || TREE_CODE (old_fieldname) == C_TOKEN_VEC) > + { > + /* For complex expressions, defer conflict checking to struct > + completion. */ > + warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes, > + "multiple %<counted_by%> attributes on the same field; > " > + "only the last one will be used"); > + } > } > > return NULL_TREE; > diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc > index acbe2b88e674..615d0f5a1e88 100644 > --- a/gcc/c/c-decl.cc > +++ b/gcc/c/c-decl.cc > @@ -9445,43 +9445,90 @@ verify_counted_by_attribute (tree struct_type, tree > field_decl) > if (!attr_counted_by) > return; > > - /* If there is an counted_by attribute attached to the field, > + /* If there is a counted_by attribute attached to the field, > verify it. */ > > - tree fieldname = TREE_VALUE (TREE_VALUE (attr_counted_by)); > + tree argument; > + tree parsed_expr = NULL_TREE; > > - /* Verify the argument of the attrbute is a valid field of the > - containing structure. */ > - > - tree counted_by_field = lookup_field (struct_type, fieldname); > - > - /* Error when the field is not found in the containing structure and > - remove the corresponding counted_by attribute from the field_decl. */ > - if (!counted_by_field) > - { > - error_at (DECL_SOURCE_LOCATION (field_decl), > - "argument %qE to the %<counted_by%> attribute" > - " is not a field declaration in the same structure" > - " as %qD", fieldname, field_decl); > - DECL_ATTRIBUTES (field_decl) > - = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl)); > - } > + /* Handle both C_TOKEN_VEC and regular argument formats */ > + tree args = TREE_VALUE (attr_counted_by); > + if (TREE_CODE (args) == C_TOKEN_VEC) > + argument = args; /* Direct C_TOKEN_VEC */ > else > - /* Error when the field is not with an integer type. */ > + argument = TREE_VALUE (args); /* Standard tree list format */ > + > + /* Handle both simple identifiers and complex expressions. */ > + if (TREE_CODE (argument) == IDENTIFIER_NODE) > { > + /* Simple identifier case - existing logic. */ > + tree counted_by_field = lookup_field (struct_type, argument); > + > + if (!counted_by_field) > + { > + error_at (DECL_SOURCE_LOCATION (field_decl), > + "argument %qE to the %<counted_by%> attribute" > + " is not a field declaration in the same structure" > + " as %qD", > + argument, field_decl); > + DECL_ATTRIBUTES (field_decl) > + = remove_attribute ("counted_by", DECL_ATTRIBUTES > (field_decl)); > + return; > + } > + > + /* Find the actual field declaration. */ > while (TREE_CHAIN (counted_by_field)) > counted_by_field = TREE_CHAIN (counted_by_field); > tree real_field = TREE_VALUE (counted_by_field); > > if (!INTEGRAL_TYPE_P (TREE_TYPE (real_field))) > { > - error_at (DECL_SOURCE_LOCATION (field_decl), > - "argument %qE to the %<counted_by%> attribute" > - " is not a field declaration with an integer type", > - fieldname); > - DECL_ATTRIBUTES (field_decl) > + error_at (DECL_SOURCE_LOCATION (field_decl), > + "argument %qE to the %<counted_by%> attribute" > + " is not a field declaration with an integer type", > + argument); > + DECL_ATTRIBUTES (field_decl) > + = remove_attribute ("counted_by", DECL_ATTRIBUTES > (field_decl)); > + return; > + } > + } > + else if (TREE_CODE (argument) == C_TOKEN_VEC) > + { > + /* Complex expression case - parse the stored tokens. */ > + parsed_expr > + = c_parse_token_vec_expression_with_struct (argument, struct_type); > + > + if (!parsed_expr) > + { > + error_at (DECL_SOURCE_LOCATION (field_decl), > + "invalid expression in %<counted_by%> attribute"); > + DECL_ATTRIBUTES (field_decl) > = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl)); > - } > + return; > + } > + > + /* Validate that the expression result is integer type. */ > + if (!INTEGRAL_TYPE_P (TREE_TYPE (parsed_expr))) > + { > + error_at (DECL_SOURCE_LOCATION (field_decl), > + "expression in %<counted_by%> attribute does not" > + " have integer type"); > + DECL_ATTRIBUTES (field_decl) > + = remove_attribute ("counted_by", DECL_ATTRIBUTES > (field_decl)); > + return; > + } > + > + /* If we successfully parsed the expression, replace the C_TOKEN_VEC > + with the parsed expression for later use. */ > + TREE_VALUE (attr_counted_by) = build_tree_list (NULL_TREE, > parsed_expr); > + } > + else > + { > + error_at (DECL_SOURCE_LOCATION (field_decl), > + "invalid argument to %<counted_by%> attribute"); > + DECL_ATTRIBUTES (field_decl) > + = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl)); > + return; > } > } > > diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc > index 5119841a5895..a9c641151d39 100644 > --- a/gcc/c/c-parser.cc > +++ b/gcc/c/c-parser.cc > @@ -5623,6 +5623,30 @@ c_parser_attribute_arguments (c_parser *parser, bool > takes_identifier, > return attr_args; > } > > +static bool > +c_parser_bounds_safety_attr (tree attr_name) > +{ > + const char *name = IDENTIFIER_POINTER (attr_name); > + return !strcmp (name, "counted_by"); > +} > + > +/* Check if the bounds safety attribute needs complex expression parsing. */ > +static bool > +c_parser_needs_complex_bounds_parsing (c_parser *parser) > +{ > + /* Look ahead to see if this is a simple identifier or a complex > + expression. */ > + c_token *tok = c_parser_peek_token (parser); > + if (tok->type != CPP_NAME) > + return true; > + > + c_token *next = c_parser_peek_2nd_token (parser); > + if (next->type == CPP_CLOSE_PAREN) > + return false; > + > + return true; > +} > + > /* Parse (possibly empty) gnu-attributes. This is a GNU extension. > > gnu-attributes: > @@ -5692,12 +5716,56 @@ c_parser_gnu_attribute (c_parser *parser, tree attrs, > } > c_parser_consume_token (parser); > > - tree attr_args > - = c_parser_attribute_arguments (parser, > - attribute_takes_identifier_p (attr_name), > - false, > - is_attribute_p ("assume", attr_name), > - true); > + tree attr_args; > + if (!c_parser_bounds_safety_attr (attr_name) > + || !c_parser_needs_complex_bounds_parsing (parser)) > + attr_args = c_parser_attribute_arguments ( > + parser, attribute_takes_identifier_p (attr_name), false, > + is_attribute_p ("assume", attr_name), true); > + else > + { > + unsigned int n = 1; > + unsigned int paren_count = 0; > + > + for (; true; ++n) > + { > + c_token *tok = c_parser_peek_nth_token (parser, n); > + > + /* Safety check to avoid infinite loops. */ > + if (tok->type == CPP_EOF) > + { > + c_parser_error (parser, "expected identifier or expression"); > + return error_mark_node; > + } > + > + if (tok->type == CPP_CLOSE_PAREN) > + { > + if (!paren_count) > + break; > + paren_count--; > + } > + else if (tok->type == CPP_OPEN_PAREN) > + paren_count++; > + else if (tok->type == CPP_SEMICOLON) > + { > + c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, > + "expected identifier or > expression"); > + return error_mark_node; > + } > + } > + > + vec<c_token, va_gc> *v; > + vec_alloc (v, n - 1); > + for (--n; n; --n) > + { > + c_token *tok = c_parser_peek_token (parser); > + v->quick_push (*tok); > + c_parser_consume_token (parser); > + } > + > + attr_args = make_node (C_TOKEN_VEC); > + C_TOKEN_VEC_TOKENS (attr_args) = v; > + } > > attr = build_tree_list (attr_name, attr_args); > if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN)) > @@ -27746,6 +27814,49 @@ c_maybe_parse_omp_decl (tree decl, tree d) > return true; > } > > +/* Context for parsing counted_by expressions within struct scope. */ > +tree bounds_safety_attr_struct_type = NULL_TREE; > + > +/* Parse an expression from a C_TOKEN_VEC. This is used for delayed > + parsing of counted_by attribute expressions. */ > + > +tree > +c_parse_token_vec_expression (tree token_vec) > +{ > + vec<c_token, va_gc> *tokens = C_TOKEN_VEC_TOKENS (token_vec); > + > + if (!tokens || vec_safe_is_empty (tokens)) > + return NULL_TREE; > + > + /* Create a parser for just the list of tokens. */ > + c_parser delay_parser = {}; > + delay_parser.tokens = tokens->address (); > + delay_parser.tokens_avail = tokens->length (); > + delay_parser.raw_tokens = tokens; > + > + /* Parse the expression. */ > + c_expr expr = c_parser_expr_no_commas (&delay_parser, NULL); > + > + if (expr.value == error_mark_node) > + return NULL_TREE; > + > + return expr.value; > +} > + > +/* Parse an expression from a C_TOKEN_VEC with struct context. */ > + > +tree > +c_parse_token_vec_expression_with_struct (tree token_vec, tree struct_type) > +{ > + tree saved_struct_type = bounds_safety_attr_struct_type; > + bounds_safety_attr_struct_type = struct_type; > + > + tree result = c_parse_token_vec_expression (token_vec); > + > + bounds_safety_attr_struct_type = saved_struct_type; > + return result; > +} > + > /* OpenMP 4.0: > # pragma omp declare target new-line > declarations and definitions > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h > index a84779bcbf83..577644cef13a 100644 > --- a/gcc/c/c-parser.h > +++ b/gcc/c/c-parser.h > @@ -205,5 +205,6 @@ extern void c_parser_declspecs (c_parser *, struct > c_declspecs *, bool, bool, > enum c_lookahead_kind); > extern struct c_type_name *c_parser_type_name (c_parser *, bool = false); > extern bool c_maybe_parse_omp_decl (tree, tree); > +extern tree c_parse_token_vec_expression_with_struct (tree, tree); > > #endif > diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc > index f161bd9d0e74..788c0fe3763f 100644 > --- a/gcc/c/c-typeck.cc > +++ b/gcc/c/c-typeck.cc > @@ -56,6 +56,9 @@ along with GCC; see the file COPYING3. If not see > #include "tree-pretty-print-markup.h" > #include "gcc-urlifier.h" > > +/* Forward declaration for bounds safety attribute context. */ > +extern tree bounds_safety_attr_struct_type; > + > /* Possible cases of implicit conversions. Used to select diagnostic > messages > and control folding initializers in convert_for_assignment. */ > enum impl_conv { > @@ -3528,6 +3531,24 @@ build_external_ref (location_t loc, tree id, bool fun, > tree *type) > return error_mark_node; > else > { > + /* Check if we're parsing a counted_by expression and can resolve > + the identifier as a struct field. */ > + if (bounds_safety_attr_struct_type) > + { > + tree field = lookup_field (bounds_safety_attr_struct_type, id); > + if (field) > + { > + /* We found a field - create a reference to it */ > + while (TREE_CHAIN (field)) > + field = TREE_CHAIN (field); > + tree real_field = TREE_VALUE (field); > + > + ref = real_field; > + *type = TREE_TYPE (ref); > + return ref; > + } > + } > + > undeclared_variable (loc, id); > return error_mark_node; > }