Is it reasonable to add one option to disable the “counted_by” attribute? (then no insertion of the new .ACCESS_WITH_SIZE into IL).
The major reason is: some users might want to ignore all the “counted_by” attribute added in the source code, We need to provide them a way to disable this feature. thanks. Qing > On Nov 6, 2023, at 7:12 PM, Qing Zhao <qing.z...@oracle.com> wrote: > > Hi, > > Attached is the 2nd version of the proposal based on all the discussion so > far. > > Let me know if you have more comment and suggestion. > > Thanks a lot for all the help. > > Qing > =========================================== > Represent the missing dependence for the "counted_by" attribute and its > consumers > > Qing Zhao > > 11/06/2023 > ============================================== > > The whole discussion is at: > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633783.html > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634844.html > > 1. The problem > > There is a data dependency between the size assignment and the implicit use > of the size information in the __builtin_dynamic_object_size that is missing > in the IL (line 11 and line 13 in the below example). Such information > missing will result incorrect code reordering and other code transformations. > > 1 struct A > 2 { > 3 size_t size; > 4 char buf[] __attribute__((counted_by(size))); > 5 }; > 6 > 7 size_t > 8 foo (size_t sz) > 9 { > 10 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); > 11 obj->size = sz; > 12 obj->buf[0] = 2; > 13 return __builtin_dynamic_object_size (obj->buf, 1); > 14 } > > Please see a more complicate example in the Appendex 1. > > We need to represent such data dependency correctly in the IL. > > 2. The solution: > > 2.1 Summary > > * Add a new internal function ".ACCESS_WITH_SIZE" to carry the size > information for every reference to a FAM field; > * In C FE, Replace every reference to a FAM field whose TYPE has the > "counted_by" attribute with the new internal function ".ACCESS_WITH_SIZE"; > * In every consumer of the size information, for example, BDOS or array bound > sanitizer, query the size information or ACCESS_MODE information from the new > internal function; > * When the size information and the "ACCESS_MODE" information are not used > anymore, possibly at the 2nd object size phase, replace the internal function > with the actual reference to the FAM field; > * Some adjustment to inlining heuristic, ipa alias analysis, and other SSA > passes to mitigate the impact to the optimizer and code generation. > > 2.2 The new internal function > > .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, > ACCESS_MODE) > > INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL) > > which returns the "REF_TO_OBJ" same as the 1st argument; > > 1st argument "REF_TO_OBJ": The reference to the object; > 2nd argument "REF_TO_SIZE": The reference to the size of the object, > 3rd argument "CLASS_OF_SIZE": The size referenced by the REF_TO_SIZE > represents > 0: unknown; > 1: the number of the elements of the object type; > 2: the number of bytes; > 4th argument "SIZE_OF_SIZE": how many bytes is the object that REF_TO_SIZE > points; > 5th argument "ACCESS_MODE": > -1: Unknown access semantics > 0: none > 1: read_only > 2: write_only > 3: read_write > > NOTEs, > A. This new internal function is intended for a more general use from all > the 3 attributes, "access", "alloc_size", and the new "counted_by", to encode > the "size" and "access_mode" information to the corresponding pointer. (in > order to resolve PR96503, etc. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503); > B. For "counted_by" the 3rd argument will be 1; > C. For "counted_by" and "alloc_size" attributes, the 5th argument will be > -1; > D. In this wrieup, we focus on the implementation details for the > "counted_by" attribute. However, this function should be ready to be used by > "access" and "alloc_size" without issue. > > 2.3 A new semantic requirement in the user documentation of "counted_by" > > For the following structure including a FAM with a counted_by attribute: > > struct A > { > size_t size; > char buf[] __attribute__((counted_by(size))); > }; > > for any object with such type: > > struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); > > The initialization to the size field should be done before the first > reference to the FAM field, > Otherwise, the behavior is undefined. > Such additional requirement to the user will guarantee that the first > reference to the FAM knows the size of the FAM. > > Another thing that need to be clarified is: > A later reference to the FAM field will use the latest value assigned to the > size field before that reference. For example, > obj->size = val1; > ref1 (obj->buf); > obj->size = val2; > ref2 (obj->buf); > in the above, "ref1" will use val1 and "ref2" will use val2. > This clarification will inform user that the dynamic array feature is fully > supported. > > We need to add the above additional requirement and clarification to the user > documentation. > The complete user documentation is in Appendix 2. > > 2.4 Replace the reference to a FAM field with the new function > .ACCESS_WITH_SIZE > > In C FE: > > for every reference to a FAM, for example, "obj->buf" in the small example, > check whether the corresponding FIELD_DECL has a "counted_by" attribute? > if YES, replace the reference to "obj->buf" with a call to > .ACCESS_WITH_SIZE (obj->buf, &obj->size, 1, sizeof(obj->size), -1); > > This includes the case when the object is statically allocated and > initialized as following example: > > static struct A x = { sizeof "hello", "hello" }; > static char *y = &x.buf; > > The reference to x.buf should be replaced by the .ACCESS_WITH_SIZE (&x.buf, > &x.size, 1, sizeof(x.size), -1). > > > 2.5 Query the size info > > There are multiple consumers of the size info (and ACCESS_MODE info): > > * __builtin_dynamic_object_size; > * array bound sanitizer; > > in these consumers, get the size info from the 2nd, 3rd, and 4th arguments of > the call to > ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, 1, sizeof(SIZE), -1) > > 2.6 Eliminate the internal function when not useful anymore > > After the last consumer of the size information in the ACCESS_WITH_SIZE, We > should replace the internal call with its first argument. > > Do it in the 2nd object size phase. > > 2.7 Adjustment to inlining heuristic, IPA alias analysis and other IPA > analysis > > the FE changes: > > obj->buf > > to > > _1 = obj->buf; > _2 = &(obj->size); > .ACCESS_WITH_SIZE (_1, _2, 1, sizeof (obj->size), -1) > > there are major two changes: > > A. the # of Insns of the routine will be increased, > B. additional calls to an non-pure internal routines. > > In order to minimize the impact on code generation and optimizaitons, We > might need to > > * Adjust Inlining decision to ignore the additional insns and calls; > * Adjust the routine "call_may_clobber_ref_p_1" in tree-ssa-alias.cc to > exclude the new internal .ACCESS_WITH_SIZE from clobbering anything. > > > 3. The patch sets: > > We will provide the following patches: > > 3.1 Provide counted_by attribute to flexible array member field; > which includes: > * "counted_by" attribute documentation; > * C FE handling of the new attribute; > syntax checking, error reporting; > * testing cases; (aditional testing cases for staticly initialized > object) > > 3.2 Convert "counted_by" attribute to/from .ACCESS_WITH_SIZE. > which includes: > * The definition of the new internal function .ACCESS_WITH_SIZE in > internal-fn.def. > * C FE converts every reference to a FAM with "counted_by" attribute to > a call to the internal function .ACCESS_WITH_SIZE. > (build_component_ref in c_typeck.cc) > * Convert every call to .ACCESS_WITH_SIZE to its first argument. > (After the size and access_mode information are used in the 2nd object > size phase, we need to convert .ACCESS_WITH_SIZE back to its first argument > as soon as possible in order to minimize the impact to other phases after > object size phase. So, do this in the end of the 2nd object size phase) > * adjust inlining heuristic for the additional call. > * adjust alias analysis to exclude the new internal from clobbering > anything. > (call_may_clobber_ref_p_1 in tree-ssa-alias.cc) > * verify a call to .ACCESS_WITH_SIZE in tree-cfg.cc. > * provide utility routines to get the size or ACCESS_MODE from the call > to .ACCESS_WITH_SIZE. > > 3.3 Use the .ACCESS_WITH_SIZE in builtin object size (sub-object only) > which includes: > * use the size info of the .ACCESS_WITH_SIZE for sub-object. > * testing cases. (additional testing cases for dynamic array support) > > 3.4 Use the .ACCESS_WITH_SIZE in bound sanitizer > which includes: > * use the size info of the .ACCESS_WITH_SIZE for bound sanitizer. > * testing cases. (additional testing cases for dynamic array support) > > 3.5 Improve __bdos to use the counted_by info in whole-object size for the > structure with FAM. > 3.6 Emit warnings when the user breaks the requirments for the new > counted_by attribute > compilation time: -Wcounted-by > run time: -fsanitizer=counted-by > > * The initialization to the size field should be done before the first > reference to the FAM field. > * the array has at least # of elements specified by the size field all > the time during the program. > > > 4. Future work > > 4.1 Use the new .ACCESS_WITH_SIZE for other relative attributes, for > example, "access" and "alloc_size", to resolve the known issues for them: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503 > > 4.2 Add "counted_by" attribute to general pointers as proposed by the > -fbounds-safety proposal to LLVM from Apple > (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854). > > > the .ACCESS_WITH_SIZE approach should be adopted naturally. > > > Appendex 1. A more complicated example: > > 1 #include <malloc.h> > 2 struct A > 3 { > 4 size_t size; > 5 char buf[] __attribute__((counted_by(size))); > 6 }; > 7 > 8 static size_t > 9 get_size_from (void *ptr) > 10 { > 11 return __builtin_dynamic_object_size (ptr, 1); > 12 } > 13 > 14 void > 15 foo (size_t sz) > 16 { > 17 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); > 18 obj->size = sz; > 19 obj->buf[0] = 2; > 20 __builtin_printf (“%d\n", get_size_from (obj->buf)); > 21 return; > 22 } > 23 > 24 int main () > 25 { > 26 foo (20); > 27 return 0; > 28 } > > In this example, the instantiation of the object at line 17 is not in the > same routine as the _BDOS call at line 11. > > Appendex 2. The user documentation: > > 'counted_by (COUNT)' > The 'counted_by' attribute may be attached to the flexible array > member of a structure. It indicates that the number of the > elements of the array is given by the field named "COUNT" in the > same structure as the flexible array member. GCC uses this > information to improve the results of the array bound sanitizer and > the '__builtin_dynamic_object_size'. > > For instance, the following code: > > struct P { > size_t count; > char other; > char array[] __attribute__ ((counted_by (count))); > } *p; > > specifies that the 'array' is a flexible array member whose number > of elements is given by the field 'count' in the same structure. > > The field that represents the number of the elements should have an > integer type. Otherwise, the compiler will report a warning and ignore > the attribute. > > An explicit 'counted_by' annotation defines a relationship between > two objects, 'p->array' and 'p->count', and there are the following > requirementthat on the relationship between this pair: > > A. 'p->count' should be initialized before the first reference to > 'p->array'. > B. "p->array' has _at least_ 'p->count' number of elements available > all the time. > This requirement must hold even after any of these related objects > are updated during the program. > > It's the user's responsibility to make sure the above requirments to > be kept all the time. Otherwise the compiler will report warnings, > at the same time, the results of the array bound sanitizer and the > '__builtin_dynamic_object_size' is undefined. > > One important feature of the attribute is, A reference to the flexible > array member field will use the latest value assigned to the field that > represent the number of the elements before that reference. For example, > p->count = val1; > ref1 (p->array); > p->count = val2; > ref2 (p->array); > in the above, 'ref1' will use 'val1' as the number of the elements in > 'p->array', and "ref2" will use 'val2' as the number of elements in > 'p->array'. > > > Appendex 3. Other approaches that have been discussed: > > A. Add an additional argument, the size parameter, to the call to __bdos, > A.1, during FE; > A.2, during gimplification phase; > B. Encode the implicit USE in the type of size, to make the size as > “volatile”; > C. Encode the implicit USE in the type of buf, then update the optimization > passes to use this implicit USE encoded in the type of buf. > > ** Approach A (both A.1 and A.2) will not work if the object instantiation is > not in the same routine as the __bdos call as shown in the above example; > ** Approach B will have a bigger performance impact due to the "volatile" > will inhibit a lot of optimizations; > ** Approach C will involve a lot of changes in GCC, and also not very > necessary since the ONLY implicit use of the size parameter is in the __bdos > call when __bdos can see the real object.