Is it reasonable to add one option to disable the “counted_by” attribute?
(then no insertion of the new .ACCESS_WITH_SIZE into IL).  

The major reason is: some users might want to ignore all the “counted_by” 
attribute added in the source code,
We need to provide them a way to disable this feature.

thanks.

Qing

> On Nov 6, 2023, at 7:12 PM, Qing Zhao <qing.z...@oracle.com> wrote:
> 
> Hi,
> 
> Attached is the 2nd version of the proposal based on all the discussion so 
> far.
> 
> Let me know if you have more comment and suggestion.
> 
> Thanks a lot for all the help.
> 
> Qing
> ===========================================
> Represent the missing dependence for the "counted_by" attribute and its 
> consumers 
> 
> Qing Zhao
> 
> 11/06/2023
> ==============================================
> 
> The whole discussion is at:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634844.html
> 
> 1. The problem
> 
> There is a data dependency between the size assignment and the implicit use 
> of the size information in the __builtin_dynamic_object_size that is missing 
> in the IL (line 11 and line 13 in the below example). Such information 
> missing will result incorrect code reordering and other code transformations. 
> 
>  1 struct A
>  2 {
>  3  size_t size;
>  4  char buf[] __attribute__((counted_by(size)));
>  5 };
>  6 
>  7 size_t 
>  8 foo (size_t sz)
>  9 {
> 10  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 11  obj->size = sz;
> 12  obj->buf[0] = 2;
> 13  return __builtin_dynamic_object_size (obj->buf, 1);
> 14 }
> 
> Please see a more complicate example in the Appendex 1.
> 
> We need to represent such data dependency correctly in the IL. 
> 
> 2. The solution:
> 
> 2.1 Summary
> 
> * Add a new internal function ".ACCESS_WITH_SIZE" to carry the size 
> information for every reference to a FAM field;
> * In C FE, Replace every reference to a FAM field whose TYPE has the 
> "counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE";
> * In every consumer of the size information, for example, BDOS or array bound 
> sanitizer, query the size information or ACCESS_MODE information from the new 
> internal function;
> * When the size information and the "ACCESS_MODE" information are not used 
> anymore, possibly at the 2nd object size phase, replace the internal function 
> with the actual reference to the FAM field; 
> * Some adjustment to inlining heuristic, ipa alias analysis, and other SSA 
> passes to mitigate the impact to the optimizer and code generation. 
> 
> 2.2 The new internal function 
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
> ACCESS_MODE)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the "REF_TO_OBJ" same as the 1st argument;
> 
> 1st argument "REF_TO_OBJ": The reference to the object;
> 2nd argument "REF_TO_SIZE": The reference to the size of the object, 
> 3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE 
> represents 
>   0: unknown;
>   1: the number of the elements of the object type;
>   2: the number of bytes; 
> 4th argument "SIZE_OF_SIZE": how many bytes is the object that REF_TO_SIZE 
> points;
> 5th argument "ACCESS_MODE": 
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write
> 
> NOTEs, 
>  A. This new internal function is intended for a more general use from all 
> the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode 
> the "size" and "access_mode" information to the corresponding pointer. (in 
> order to resolve PR96503, etc. 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503);
>  B. For "counted_by" the 3rd argument will be 1;
>  C. For "counted_by" and "alloc_size" attributes, the 5th argument will be 
> -1;   
>  D. In this wrieup, we focus on the implementation details for the 
> "counted_by" attribute. However, this function should be ready to be used by 
> "access" and "alloc_size" without issue. 
> 
> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
>  struct A
>  {
>   size_t size;
>   char buf[] __attribute__((counted_by(size)));
>  };
> 
> for any object with such type:
> 
>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
> The initialization to the size field should be done before the first 
> reference to the FAM field,
> Otherwise, the behavior is undefined.
> Such additional requirement to the user will guarantee that the first 
> reference to the FAM knows the size of the FAM.  
> 
> Another thing that need to be clarified is:
> A later reference to the FAM field will use the latest value assigned to the 
> size field before that reference. For example, 
> obj->size = val1;
> ref1 (obj->buf);
> obj->size = val2;
> ref2 (obj->buf);
> in the above, "ref1" will use val1 and "ref2" will use val2. 
> This clarification will inform user that the dynamic array feature is fully 
> supported.
> 
> We need to add the above additional requirement and clarification to the user 
> documentation.
> The complete user documentation is in Appendix 2. 
> 
> 2.4 Replace the reference to a FAM field with the new function 
> .ACCESS_WITH_SIZE
> 
> In C FE:
> 
> for every reference to a FAM, for example, "obj->buf" in the small example,
>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>  if YES, replace the reference to "obj->buf" with a call to
>      .ACCESS_WITH_SIZE (obj->buf, &obj->size, 1, sizeof(obj->size), -1); 
> 
> This includes the case when the object is statically allocated and 
> initialized as following example:
> 
> static struct A x = { sizeof "hello", "hello" };
> static char *y = &x.buf;
> 
> The reference to x.buf should be replaced by the .ACCESS_WITH_SIZE (&x.buf, 
> &x.size, 1, sizeof(x.size), -1).
> 
> 
> 2.5 Query the size info 
> 
> There are multiple consumers of the size info (and ACCESS_MODE info):
> 
>  * __builtin_dynamic_object_size;
>  * array bound sanitizer;
> 
> in these consumers, get the size info from the 2nd, 3rd, and 4th arguments of 
> the call to
> ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, 1, sizeof(SIZE), -1)
> 
> 2.6 Eliminate the internal function when not useful anymore
> 
> After the last consumer of the size information in the ACCESS_WITH_SIZE, We 
> should replace the internal call with its first argument.
> 
> Do it in the 2nd object size phase. 
> 
> 2.7 Adjustment to inlining heuristic, IPA alias analysis and other IPA 
> analysis
> 
> the FE changes:
> 
> obj->buf
> 
> to
> 
> _1 = obj->buf;
> _2 = &(obj->size);
> .ACCESS_WITH_SIZE (_1, _2, 1, sizeof (obj->size), -1)
> 
> there are major two changes:
> 
>  A. the # of Insns of the routine will be increased,
>  B. additional calls to an non-pure internal routines.
> 
> In order to minimize the impact on code generation and optimizaitons, We 
> might need to
> 
>  * Adjust Inlining decision to ignore the additional insns and calls;
>  * Adjust the routine "call_may_clobber_ref_p_1" in tree-ssa-alias.cc to 
> exclude the new internal .ACCESS_WITH_SIZE from clobbering anything. 
> 
> 
> 3. The patch sets:
> 
> We will provide the following patches:
> 
>  3.1 Provide counted_by attribute to flexible array member field;
>      which includes:
>      * "counted_by" attribute documentation;  
>      * C FE handling of the new attribute; 
>        syntax checking, error reporting;
>      * testing cases; (aditional testing cases for staticly initialized 
> object)
> 
>  3.2 Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE.  
>      which includes:
>      * The definition of the new internal function .ACCESS_WITH_SIZE in 
> internal-fn.def. 
>      * C FE converts every reference to a FAM with "counted_by" attribute to 
> a call to the internal function .ACCESS_WITH_SIZE.
>        (build_component_ref in c_typeck.cc)
>      * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>        (After the size and access_mode information are used in the 2nd object 
> size phase, we need to convert .ACCESS_WITH_SIZE back to its first argument 
> as soon as possible in order to minimize the impact to other phases after 
> object size phase. So, do this in the end of the 2nd object size phase)
>      * adjust inlining heuristic for the additional call. 
>      * adjust alias analysis to exclude the new internal from clobbering 
> anything.
>        (call_may_clobber_ref_p_1 in tree-ssa-alias.cc)
>      * verify a call to .ACCESS_WITH_SIZE in tree-cfg.cc.
>      * provide utility routines to get the size or ACCESS_MODE from the call 
> to .ACCESS_WITH_SIZE. 
> 
>  3.3 Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only)
>      which includes: 
>      * use the size info of the .ACCESS_WITH_SIZE for sub-object.
>      * testing cases. (additional testing cases for dynamic array support)
> 
>  3.4 Use the .ACCESS_WITH_SIZE in bound sanitizer
>      which includes:
>      * use the size info of the .ACCESS_WITH_SIZE for bound sanitizer.
>      * testing cases. (additional testing cases for dynamic array support)
> 
>  3.5 Improve __bdos to use the counted_by info in whole-object size for the 
> structure with FAM. 
>  3.6 Emit warnings when the user breaks the requirments for the new 
> counted_by attribute
>      compilation time: -Wcounted-by
>      run time: -fsanitizer=counted-by
> 
>      * The initialization to the size field should be done before the first 
> reference to the FAM field. 
>      * the array has at least # of elements specified by the size field all 
> the time during the program.  
> 
> 
> 4. Future work
> 
>  4.1 Use the new .ACCESS_WITH_SIZE for other relative attributes, for 
> example, "access" and "alloc_size", to resolve the known issues for them:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503
> 
>  4.2 Add "counted_by" attribute to general pointers as proposed by the 
> -fbounds-safety proposal to LLVM from Apple 
> (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854).
>   
> 
> the .ACCESS_WITH_SIZE approach should be adopted naturally. 
> 
> 
> Appendex 1. A more complicated example: 
> 
>  1 #include <malloc.h>
>  2 struct A
>  3 {
>  4  size_t size;
>  5  char buf[] __attribute__((counted_by(size)));
>  6 };
>  7 
>  8 static size_t
>  9 get_size_from (void *ptr)
> 10 {
> 11  return __builtin_dynamic_object_size (ptr, 1);
> 12 }
> 13 
> 14 void
> 15 foo (size_t sz)
> 16 {
> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 18  obj->size = sz;
> 19  obj->buf[0] = 2;
> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> 21  return;
> 22 }
> 23 
> 24 int main ()
> 25 {
> 26  foo (20);
> 27  return 0;
> 28 }
> 
> In this example, the instantiation of the object at line 17 is not in the 
> same routine as the _BDOS call at line 11. 
> 
> Appendex 2. The user documentation:
> 
> 'counted_by (COUNT)'
>     The 'counted_by' attribute may be attached to the flexible array
>     member of a structure.  It indicates that the number of the
>     elements of the array is given by the field named "COUNT" in the
>     same structure as the flexible array member.  GCC uses this
>     information to improve the results of the array bound sanitizer and
>     the '__builtin_dynamic_object_size'.
> 
>     For instance, the following code:
> 
>          struct P {
>            size_t count;
>            char other;
>            char array[] __attribute__ ((counted_by (count)));
>          } *p;
> 
>     specifies that the 'array' is a flexible array member whose number
>     of elements is given by the field 'count' in the same structure.
> 
>     The field that represents the number of the elements should have an
>     integer type. Otherwise, the compiler will report a warning and ignore 
>     the attribute.
> 
>     An explicit 'counted_by' annotation defines a relationship between
>     two objects, 'p->array' and 'p->count', and there are the following
>     requirementthat on the relationship between this pair:
> 
>     A. 'p->count' should be initialized before the first reference to 
>        'p->array'.
>     B. "p->array' has _at least_ 'p->count' number of elements available
>        all the time.
>        This requirement must hold even after any of these related objects
>        are updated during the program.
> 
>     It's the user's responsibility to make sure the above requirments to 
>     be kept all the time.  Otherwise the compiler will report warnings, 
>     at the same time, the results of the array bound sanitizer and the
>     '__builtin_dynamic_object_size' is undefined. 
> 
>     One important feature of the attribute is, A reference to the flexible
>     array member field will use the latest value assigned to the field that
>     represent the number of the elements before that reference. For example,
>       p->count = val1;
>       ref1 (p->array);
>       p->count = val2;
>       ref2 (p->array);
>     in the above, 'ref1' will use 'val1' as the number of the elements in
>     'p->array', and "ref2" will use 'val2' as the number of elements in
>     'p->array'.
> 
> 
> Appendex 3. Other approaches that have been discussed:
> 
> A.  Add an additional argument, the size parameter, to the call to __bdos, 
>      A.1, during FE;
>      A.2, during gimplification phase;
> B.  Encode the implicit USE in the type of size, to make the size as 
> “volatile”;
> C.  Encode the implicit USE in the type of buf, then update the optimization 
> passes to use this implicit USE encoded in the type of buf.
> 
> ** Approach A (both A.1 and A.2) will not work if the object instantiation is 
> not in the same routine as the __bdos call as shown in the above example;
> ** Approach B will have a bigger performance impact due to the "volatile" 
> will inhibit a lot of optimizations; 
> ** Approach C will involve a lot of changes in GCC, and also not very 
> necessary since the ONLY implicit use of the size parameter is in the __bdos 
> call when __bdos can see the real object.

Reply via email to