> On Oct 25, 2023, at 6:39 AM, Martin Uecker <uec...@tugraz.at> wrote:
> 
> Am Mittwoch, dem 25.10.2023 um 12:25 +0200 schrieb Richard Biener:
>> 
>>> Am 25.10.2023 um 10:16 schrieb Martin Uecker <uec...@tugraz.at>:
>>> 
>>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener:
>>>> 
>>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uec...@tugraz.at>:
>>>>> 
>>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
>>>>>> Hi, Sid,
>>>>>> 
>>>>>> Really appreciate for your example and detailed explanation. Very 
>>>>>> helpful.
>>>>>> I think that this example is an excellent example to show (almost) all 
>>>>>> the issues we need to consider.
>>>>>> 
>>>>>> I slightly modified this example to make it to be compilable and 
>>>>>> run-able, as following: 
>>>>>> (but I still cannot make the incorrect reordering or DSE happening, 
>>>>>> anyway, the potential reordering possibility is there…)
>>>>>> 
>>>>>> 1 #include <malloc.h>
>>>>>> 2 struct A
>>>>>> 3 {
>>>>>> 4  size_t size;
>>>>>> 5  char buf[] __attribute__((counted_by(size)));
>>>>>> 6 };
>>>>>> 7 
>>>>>> 8 static size_t
>>>>>> 9 get_size_from (void *ptr)
>>>>>> 10 {
>>>>>> 11  return __builtin_dynamic_object_size (ptr, 1);
>>>>>> 12 }
>>>>>> 13 
>>>>>> 14 void
>>>>>> 15 foo (size_t sz)
>>>>>> 16 {
>>>>>> 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
>>>>>> sizeof(char));
>>>>>> 18  obj->size = sz;
>>>>>> 19  obj->buf[0] = 2;
>>>>>> 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>>>>>> 21  return;
>>>>>> 22 }
>>>>>> 23 
>>>>>> 24 int main ()
>>>>>> 25 {
>>>>>> 26  foo (20);
>>>>>> 27  return 0;
>>>>>> 28 }
>>>>>> 
>>>>>> With my GCC, it was compiled and worked:
>>>>>> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
>>>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>>>>>> 20
>>>>>> Situation 1: With O1 and above, the routine “get_size_from” was inlined 
>>>>>> into “foo”, therefore, the call to __bdos is in the same routine as the 
>>>>>> instantiation of the object, and the TYPE information and the attached 
>>>>>> counted_by attribute information in the TYPE of the object can be USED 
>>>>>> by the __bdos call to compute the final object size. 
>>>>>> 
>>>>>> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
>>>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
>>>>>> -1
>>>>>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
>>>>>> “foo”, therefore, the call to __bdos is Not in the same routine as the 
>>>>>> instantiation of the object, As a result, the TYPE info and the attached 
>>>>>> counted_by info of the object can NOT be USED by the __bdos call. 
>>>>>> 
>>>>>> Keep in mind of the above 2 situations, we will refer them in below:
>>>>>> 
>>>>>> 1. First,  the problem we are trying to resolve is:
>>>>>> 
>>>>>> (Your description):
>>>>>> 
>>>>>>> the reordering of __bdos w.r.t. initialization of the size parameter 
>>>>>>> but to also account for DSE of the assignment, we can abstract this 
>>>>>>> problem to that of DFA being unable to see implicit use of the size 
>>>>>>> parameter in the __bdos call.
>>>>>> 
>>>>>> basically is correct.  However, with the following exception:
>>>>>> 
>>>>>> The implicit use of the size parameter in the __bdos call is not always 
>>>>>> there, it ONLY exists WHEN the __bdos is able to evaluated to an 
>>>>>> expression of the size parameter in the “objsz” phase, i.e., the 
>>>>>> “Situation 1” of the above example. 
>>>>>> In the “Situation 2”, when the __bdos does not see the TYPE of the real 
>>>>>> object,  it does not see the counted_by information from the TYPE, 
>>>>>> therefore,  it is not able to evaluate the size of the object through 
>>>>>> the counted_by information.  As a result, the implicit use of the size 
>>>>>> parameter in the __bdos call does NOT exist at all.  The optimizer can 
>>>>>> freely reorder the initialization of the size parameter with the __bdos 
>>>>>> call since there is no data flow dependency between these two. 
>>>>>> 
>>>>>> With this exception in mind, we can see that your proposed “option 2” 
>>>>>> (making the type of size “volatile”) is too conservative, it will  
>>>>>> disable many optimizations  unnecessarily, even though it’s safe and 
>>>>>> simple to implement. 
>>>>>> 
>>>>>> As a compiler optimization person for many many years, I really don’t 
>>>>>> want to take this approach at this moment.  -:)
>>>>>> 
>>>>>> 2. Some facts I’d like to mention:
>>>>>> 
>>>>>> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
>>>>>> optimization stage. During RTL stage,  the __bdos call has already been 
>>>>>> replaced by an expression of the size parameter or a constant, the data 
>>>>>> dependency is explicitly in the IR already.  I believe that the data 
>>>>>> analysis in RTL stage should pick up the data dependency correctly, No 
>>>>>> special handling is needed in RTL.
>>>>>> 
>>>>>> B. If the __bdos call cannot see the real object , it has no way to get 
>>>>>> the “counted_by” field from the TYPE of the real object. So, if we try 
>>>>>> to add the implicit use of the “counted_by” field to the __bdos call, 
>>>>>> the object instantiation should be in the same routine as the __bdos 
>>>>>> call.  Both the FE and the gimplification phase are too early to do this 
>>>>>> work. 
>>>>>> 
>>>>>> 2. Then, what’s the best approach to resolve this problem:
>>>>>> 
>>>>>> There were several suggestions so far:
>>>>>> 
>>>>>> A.  Add an additional argument, the size parameter,  to __bdos, 
>>>>>>    A.1, during FE;
>>>>>>    A.2, during gimplification phase;
>>>>>> B.  Encode the implicit USE  in the type of size, to make the size 
>>>>>> “volatile”;
>>>>>> C.  Encode the implicit USE  in the type of buf, then update the 
>>>>>> optimization passes to use this implicit USE encoded in the type of buf.
>>>>>> 
>>>>>> As I explained in the above, 
>>>>>> ** Approach A (both A.1 and A.2) does not work;
>>>>>> ** Approach B will have big performance impact, I’d prefer not to take 
>>>>>> this approach at this moment.
>>>>>> ** Approach C will be a lot of change in GCC, and also not very 
>>>>>> necessary since the ONLY implicit use of the size parameter is in the 
>>>>>> __bdos call when __bdos can see the real object.
>>>>>> 
>>>>>> So, all the above proposed approaches, A, B, C, are not very good. 
>>>>>> 
>>>>>> Then, maybe the following might work better?
>>>>>> 
>>>>>> In the tree optimization stage, 
>>>>>>  * After the inlining transformation applied,  
>>>>>> +  * Before the data-flow related optimization happens, 
>>>>>> +  * when the data flow analysis is constructed, 
>>>>>> 
>>>>>> For each call to __bdos, add the implicit use of size parameter. 
>>>>>> 
>>>>>> Is this doable? 
>>>>> 
>>>>> Here is another proposal:  Add a new builtin function
>>>>> 
>>>>> __builtin_with_size(x, size)
>>>>> 
>>>>> that return x but behaves similar to an allocation
>>>>> function in that BDOS can look at the size argument
>>>>> to discover the size.
>>>>> 
>>>>> The FE insers this function when the field is accessed:
>>>> 
>>>> When it’s set I suppose.  Turn
>>>> 
>>>> X.l = n;
>>>> 
>>>> Into
>>>> 
>>>> X.l = __builtin_with_size (x.buf, n);
>>> 
>>> It would turn 
>>> 
>>> some_variable = (&) x.buf
>>> 
>>> into 
>>> 
>>> some_variable = __builtin_with_size ( (&) x.buf. x.len)
>> 
>> Unless you use the address of x.Len this will not work when len is 
>> initialized after buf.  And the address will not have a meaningful data 
>> dependence.
>>> 
> 
> It would be a semantic requirement for this feature that
> x.len needs to be initialized before x.buf is accessed.  

Yes, that’s right, we might need to clarify this into the documentation of the 
counted_by. 
It should be a user error if the source code violate this rule.

Qing
> 
> Otherwise, I am not sure how to define the time point 
> at which x.len should be evaluated. 
> 
>>> So the later access to x.buf and not the initialization
>>> of a member of the struct (which is too early).
>> 
>>>> 
>>>> And indeed we need sth like a fat pointer to reliably solve all the issues.
>>> 
>>> What happens for other languages such as FORTRAN 
>>> and ADA do?  Are those pointers lowered in the FE?
>> 
>> Yes
>> 
>>> To me it seems there are two sound ways to introduce
>>> such information:
>>> 
>>> - either by using the type system.  This works in
>>> the FE in C using variably modified types
>>> 
>>> char buf[n];
>>> __auto_type p = &buf;
>>> 
>>> ... = sizeof (*p);
>>> 
>>> But if I understand Jakob's comment to some PR 
>>> correctly the size information in the TREE_TYPE
>>> is not processed correctly anymore in the
>>> middle-end. 
>> 
>> The type based info is lowered during gimplification and in particular for 
>> pointer types the middle-end quickly loses track of the original type.
>> 
> 
> Would it work if we make sure that we find a suitable
> type? Or in other words, are the (non-constant) size 
> expressions inside it still useful in later passes? 
> 
> Martin
> 
> 
>> Richard 
>> 
>>> 
>>> - or one injects the information via some
>>> tree node or builtin at certain points in
>>> time as suggested here, and the compiler
>>> derives the information from these points 
>>> as tree-object-size does.  
>>> 
>>> 
>>> The use of attributes seems fragile and - looking
>>> at the access attribute also overly complex.  And 
>>> we somehow support this only for function types
>>> and not elsewhere and also this then gets lost
>>> during  inlining.   So I think for all this stuff
>>> (nonnull, access, counted_by) I think a better
>>> approach is needed.
>>> 
>>> 
>>> Martin
>>> 
>>> 
>>>> 
>>>> Richard 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>>> __builtin_with_size(x.buf, x.L);
>>>>> 
>>>>> 
>>>>> Martin
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Otherwise, we might need to take the “volatile” approach. 
>>>>>> 
>>>>>> Let me know your suggestion and comment.
>>>>>> 
>>>>>> Thanks a lot.
>>>>>> 
>>>>>> Qing
>>>>>> 
>>>>>> 
>>>>>>> __bdos is the one such implicit user of the size parameter and you're 
>>>>>>> proposing to solve this by encoding the relationship between buffer and 
>>>>>>> size at the __bdos call site.  But what about the case when the 
>>>>>>> instantiation of the object is not at the same place as the __bdos call 
>>>>>>> site, i.e. the DFA is unable to make that relationship?
>>>>>>> 
>>>>>>> The example Martin showed where the subobject gets "hidden" behind a 
>>>>>>> pointer was a trivial one where DFA *may* actually work in practice 
>>>>>>> (because the object-size pass can thread through these assignments) but 
>>>>>>> think about this one:
>>>>>>> 
>>>>>>> struct A
>>>>>>> {
>>>>>>> size_t size;
>>>>>>> char buf[] __attribute__((counted_by(size)));
>>>>>>> }
>>>>>>> 
>>>>>>> static size_t
>>>>>>> get_size_of (void *ptr)
>>>>>>> {
>>>>>>> return __bdos (ptr, 1);
>>>>>>> }
>>>>>>> 
>>>>>>> void
>>>>>>> foo (size_t sz)
>>>>>>> {
>>>>>>> struct A *obj = __builtin_malloc (sz);
>>>>>>> obj->size = sz;
>>>>>>> 
>>>>>>> ...
>>>>>>> __builtin_printf ("%zu\n", get_size_of (obj->array));
>>>>>>> ...
>>>>>>> }
>>>>>>> 
>>>>>>> Until get_size_of is inlined, no DFA can see the __bdos call in the 
>>>>>>> same place as the point where obj is allocated.  As a result, the 
>>>>>>> assignment to obj->size could get reordered (or the store eliminated) 
>>>>>>> w.r.t. the __bdos call until the inlining happens.
>>>>>>> 
>>>>>>> As a result, the relationship between buf and size established by the 
>>>>>>> attribute needs to be encoded into the type somehow.  There are two 
>>>>>>> options:
>>>>>>> 
>>>>>>> Option 1: Encode the relationship in the type of buf
>>>>>>> 
>>>>>>> This is kinda what you end up doing with component_ref_has_counted_by 
>>>>>>> and it does show the relationship if one is looking (through that 
>>>>>>> call), but nothing more that can be used to, e.g. prevent reordering or 
>>>>>>> tell the optimizer that the reference to the buf member may imply a 
>>>>>>> reference to the size member as well.  This could be remedied by 
>>>>>>> somehow encoding the USES relationship for size into the type of buf 
>>>>>>> that the optimization passes can see.  I feel like this may be a bit 
>>>>>>> convoluted to specify in a future language extension in a way that will 
>>>>>>> actually be well understood by developers, but it will likely generate 
>>>>>>> faster runtime code.  This will also likely require a bigger change 
>>>>>>> across passes.
>>>>>>> 
>>>>>>> Option 2: Encode the relationship in the type of size
>>>>>>> 
>>>>>>> The other option is to enhance the type of size somehow so that it 
>>>>>>> discourages reordering and store elimination, basically pessimizing 
>>>>>>> code.  I think volatile semantics might be the way to do this and may 
>>>>>>> even be straightforward to specify in the future language extension 
>>>>>>> given that it builds on a known language construct and is thematically 
>>>>>>> related.  However it does pessimize output for code that implements 
>>>>>>> __counted_by__.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sid
>>>>>> 
>>>>> 
>>> 
>>> -- 
>>> Univ.-Prof. Dr. rer. nat. Martin Uecker
>>> Graz University of Technology
>>> Institute of Biomedical Imaging
>>> 
>>> 
> 
> -- 
> Univ.-Prof. Dr. rer. nat. Martin Uecker
> Graz University of Technology
> Institute of Biomedical Imaging
> 
> 

Reply via email to