> On Oct 24, 2023, at 4:38 PM, Martin Uecker <uec...@tugraz.at> wrote:
> 
> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>> Hi, Sid,
>> 
>> Really appreciate for your example and detailed explanation. Very helpful.
>> I think that this example is an excellent example to show (almost) all the 
>> issues we need to consider.
>> 
>> I slightly modified this example to make it to be compilable and run-able, 
>> as following: 
>> (but I still cannot make the incorrect reordering or DSE happening, anyway, 
>> the potential reordering possibility is there…)
>> 
>>  1 #include <malloc.h>
>>  2 struct A
>>  3 {
>>  4  size_t size;
>>  5  char buf[] __attribute__((counted_by(size)));
>>  6 };
>>  7 
>>  8 static size_t
>>  9 get_size_from (void *ptr)
>> 10 {
>> 11  return __builtin_dynamic_object_size (ptr, 1);
>> 12 }
>> 13 
>> 14 void
>> 15 foo (size_t sz)
>> 16 {
>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 18  obj->size = sz;
>> 19  obj->buf[0] = 2;
>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>> 21  return;
>> 22 }
>> 23 
>> 24 int main ()
>> 25 {
>> 26  foo (20);
>> 27  return 0;
>> 28 }
>> 
>> With my GCC, it was compiled and worked:
>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> 20
>> Situation 1: With O1 and above, the routine “get_size_from” was inlined into 
>> “foo”, therefore, the call to __bdos is in the same routine as the 
>> instantiation of the object, and the TYPE information and the attached 
>> counted_by attribute information in the TYPE of the object can be USED by 
>> the __bdos call to compute the final object size. 
>> 
>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>> -1
>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
>> “foo”, therefore, the call to __bdos is Not in the same routine as the 
>> instantiation of the object, As a result, the TYPE info and the attached 
>> counted_by info of the object can NOT be USED by the __bdos call. 
>> 
>> Keep in mind of the above 2 situations, we will refer them in below:
>> 
>> 1. First,  the problem we are trying to resolve is:
>> 
>> (Your description):
>> 
>>> the reordering of __bdos w.r.t. initialization of the size parameter but to 
>>> also account for DSE of the assignment, we can abstract this problem to 
>>> that of DFA being unable to see implicit use of the size parameter in the 
>>> __bdos call.
>> 
>> basically is correct.  However, with the following exception:
>> 
>> The implicit use of the size parameter in the __bdos call is not always 
>> there, it ONLY exists WHEN the __bdos is able to evaluated to an expression 
>> of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the 
>> above example. 
>> In the “Situation 2”, when the __bdos does not see the TYPE of the real 
>> object,  it does not see the counted_by information from the TYPE, 
>> therefore,  it is not able to evaluate the size of the object through the 
>> counted_by information.  As a result, the implicit use of the size parameter 
>> in the __bdos call does NOT exist at all.  The optimizer can freely reorder 
>> the initialization of the size parameter with the __bdos call since there is 
>> no data flow dependency between these two. 
>> 
>> With this exception in mind, we can see that your proposed “option 2” 
>> (making the type of size “volatile”) is too conservative, it will  disable 
>> many optimizations  unnecessarily, even though it’s safe and simple to 
>> implement. 
>> 
>> As a compiler optimization person for many many years, I really don’t want 
>> to take this approach at this moment.  -:)
>> 
>> 2. Some facts I’d like to mention:
>> 
>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
>> optimization stage. During RTL stage,  the __bdos call has already been 
>> replaced by an expression of the size parameter or a constant, the data 
>> dependency is explicitly in the IR already.  I believe that the data 
>> analysis in RTL stage should pick up the data dependency correctly, No 
>> special handling is needed in RTL.
>> 
>> B. If the __bdos call cannot see the real object , it has no way to get the 
>> “counted_by” field from the TYPE of the real object. So, if we try to add 
>> the implicit use of the “counted_by” field to the __bdos call, the object 
>> instantiation should be in the same routine as the __bdos call.  Both the FE 
>> and the gimplification phase are too early to do this work. 
>> 
>> 2. Then, what’s the best approach to resolve this problem:
>> 
>> There were several suggestions so far:
>> 
>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>      A.1, during FE;
>>      A.2, during gimplification phase;
>> B.  Encode the implicit USE  in the type of size, to make the size 
>> “volatile”;
>> C.  Encode the implicit USE  in the type of buf, then update the 
>> optimization passes to use this implicit USE encoded in the type of buf.
>> 
>> As I explained in the above, 
>> ** Approach A (both A.1 and A.2) does not work;
>> ** Approach B will have big performance impact, I’d prefer not to take this 
>> approach at this moment.
>> ** Approach C will be a lot of change in GCC, and also not very necessary 
>> since the ONLY implicit use of the size parameter is in the __bdos call when 
>> __bdos can see the real object.
>> 
>> So, all the above proposed approaches, A, B, C, are not very good. 
>> 
>> Then, maybe the following might work better?
>> 
>> In the tree optimization stage, 
>>    * After the inlining transformation applied,  
>> +  * Before the data-flow related optimization happens, 
>> +  * when the data flow analysis is constructed, 
>> 
>> For each call to __bdos, add the implicit use of size parameter. 
>> 
>> Is this doable? 
> 
> Here is another proposal:  Add a new builtin function
> 
> __builtin_with_size(x, size)
> 
> that return x but behaves similar to an allocation
> function in that BDOS can look at the size argument
> to discover the size.
> 
> The FE insers this function when the field is accessed:
> 
> __builtin_with_size(x.buf, x.L);

Thanks for the proposal!

So what you suggested is:

For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, 
then the call to the _bdos (x.buf, 1) will
Become:

   _bdos(__builtin_with_size(x.buf, x.L), 1)?

Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?

This looks like a very promising solution.

Will study this a. Little bit more.

Qing
> 
> 
> Martin
> 
> 
> 
>> 
>> Otherwise, we might need to take the “volatile” approach. 
>> 
>> Let me know your suggestion and comment.
>> 
>> Thanks a lot.
>> 
>> Qing
>> 
>> 
>>> __bdos is the one such implicit user of the size parameter and you're 
>>> proposing to solve this by encoding the relationship between buffer and 
>>> size at the __bdos call site.  But what about the case when the 
>>> instantiation of the object is not at the same place as the __bdos call 
>>> site, i.e. the DFA is unable to make that relationship?
>>> 
>>> The example Martin showed where the subobject gets "hidden" behind a 
>>> pointer was a trivial one where DFA *may* actually work in practice 
>>> (because the object-size pass can thread through these assignments) but 
>>> think about this one:
>>> 
>>> struct A
>>> {
>>> size_t size;
>>> char buf[] __attribute__((counted_by(size)));
>>> }
>>> 
>>> static size_t
>>> get_size_of (void *ptr)
>>> {
>>> return __bdos (ptr, 1);
>>> }
>>> 
>>> void
>>> foo (size_t sz)
>>> {
>>> struct A *obj = __builtin_malloc (sz);
>>> obj->size = sz;
>>> 
>>> ...
>>> __builtin_printf ("%zu\n", get_size_of (obj->array));
>>> ...
>>> }
>>> 
>>> Until get_size_of is inlined, no DFA can see the __bdos call in the same 
>>> place as the point where obj is allocated.  As a result, the assignment to 
>>> obj->size could get reordered (or the store eliminated) w.r.t. the __bdos 
>>> call until the inlining happens.
>>> 
>>> As a result, the relationship between buf and size established by the 
>>> attribute needs to be encoded into the type somehow.  There are two options:
>>> 
>>> Option 1: Encode the relationship in the type of buf
>>> 
>>> This is kinda what you end up doing with component_ref_has_counted_by and 
>>> it does show the relationship if one is looking (through that call), but 
>>> nothing more that can be used to, e.g. prevent reordering or tell the 
>>> optimizer that the reference to the buf member may imply a reference to the 
>>> size member as well.  This could be remedied by somehow encoding the USES 
>>> relationship for size into the type of buf that the optimization passes can 
>>> see.  I feel like this may be a bit convoluted to specify in a future 
>>> language extension in a way that will actually be well understood by 
>>> developers, but it will likely generate faster runtime code.  This will 
>>> also likely require a bigger change across passes.
>>> 
>>> Option 2: Encode the relationship in the type of size
>>> 
>>> The other option is to enhance the type of size somehow so that it 
>>> discourages reordering and store elimination, basically pessimizing code.  
>>> I think volatile semantics might be the way to do this and may even be 
>>> straightforward to specify in the future language extension given that it 
>>> builds on a known language construct and is thematically related.  However 
>>> it does pessimize output for code that implements __counted_by__.
>>> 
>>> Thanks,
>>> Sid

Reply via email to