Hi Arthur,

Thanks for looking at this.

The way I understood that comment was to mean that pointers to char can alias 
with anything. 

Section 3.10 of the C++11 standard says:

"If a program attempts to access the stored value of an object through a 
glvalue of other than one of the following types the behavior is undefined:

— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type 
of the object,
— a type that is the signed or unsigned type corresponding to a cv-qualified 
version of the dynamic type
of the object,
— an aggregate or union type that includes one of the aforementioned types 
among its elements or nonstatic
data members (including, recursively, an element or non-static data member of a 
subaggregate
or contained union),
— a type that is a (possibly cv-qualified) base class type of the dynamic type 
of the object,
— a char or unsigned char type."

(I don't have the C Standard handy at the moment)

The way I am reading this is that "char *p; p = ...; d = *p;", where d is an 
int, will result in "d = *p" not getting hoisted out, since *p is an lvalue of 
type char. That is defined behaviour.  On the other hand, for "int *p; p = &c; 
d = *p;" where c is a char, "d = *p" is free to get hoisted out since *p is an 
lvalue of type int, which isn't the dynamic type of the object (char), or any 
of the other classifications from the list.  Am I misinterpreting the standard 
here?

If not, then having "char *p;" and "b" as an int array should produce no change 
in the aliasing information.  With this change, aliasing is identical for the 
following compared to what is currently generated.  

    char *p;
    typedef struct {
      char a;
      int b[100];
      char c;
    } S;

    S x;

    void func1 (char d) {
      for (int i = 0; i < 100; i++) {
        x.b[i] += 1;
        d = *p;  <==== will not get hoisted
        x.a += d;
      }
    }

I get the following (for aarch64), where "!tbaa !7" that is associated with "d 
= *p" has the base of "omnipotent char", which will prevent it from getting 
hoisted.  For example, TBAA will determine that !7 is aliased with !1 when it's 
doing analysis on "store i32 %add, i32* %arrayidx, align 4, !tbaa !1" and "%2 = 
load i8* %1, align 1, !tbaa !7"  

!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"int", metadata !3, i64 0}
!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
!4 = metadata !{metadata !"Simple C/C++ TBAA"}
!5 = metadata !{metadata !6, metadata !6, i64 0}
!6 = metadata !{metadata !"any pointer", metadata !3, i64 0}
!7 = metadata !{metadata !3, metadata !3, i64 0}
!8 = metadata !{metadata !9, metadata !3, i64 0}
!9 = metadata !{metadata !"", metadata !3, i64 0, metadata !3, i64 4, metadata 
!3, i64 404}


for.body:                                         ; preds = %for.cond
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds [100 x i32]* getelementptr inbounds 
(%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
  %0 = load i32* %arrayidx, align 4, !tbaa !1
  %add = add nsw i32 %0, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !1  <============ !1 and !7 
are aliased
  %1 = load i8** @p, align 8, !tbaa !5
  %2 = load i8* %1, align 1, !tbaa !7  <==========
  %conv = zext i8 %2 to i32
  %3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, 
!tbaa !8
  %conv1 = zext i8 %3 to i32
  %add2 = add nsw i32 %conv1, %conv
  %conv3 = trunc i32 %add2 to i8
  store i8 %conv3, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), 
align 1, !tbaa !8
  %inc = add nsw i32 %i.0, 1
  br label %for.cond



This change is in CGExpr.cpp in EmitArraySubscriptExpr when dealing with 
accesses to non-VLAs, so that the load/store that is produced is annotated with 
a tbaa tag that doesn't have "omnipotent char" as the base.

If we go back to the original case with "int *p;" and b as the char array, 
today we get the following where "store i8 %conv1, i8* %arrayidx, align 1, 
!tbaa !1" and "%2 = load i32* %1, align 4, !tbaa !6 " are aliased with each 
other due to !tbaa !1 having the base type as "omnipotent char".  So, "x.b[i] = 
..." is preventing "d = *p" from getting hoisted out.   This is undefined 
behaviour if I am reading the standard correctly.  If that is the case, I think 
we should be able to hoist "d = *p" out of the loop.

!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
!3 = metadata !{metadata !"Simple C/C++ TBAA"}
!4 = metadata !{metadata !5, metadata !5, i64 0}
!5 = metadata !{metadata !"any pointer", metadata !2, i64 0}
!6 = metadata !{metadata !7, metadata !7, i64 0}
!7 = metadata !{metadata !"int", metadata !2, i64 0}
!8 = metadata !{metadata !9, metadata !2, i64 0}
!9 = metadata !{metadata !"", metadata !2, i64 0, metadata !2, i64 1, metadata 
!2, i64 101}

for.body:                                         ; preds = %for.cond
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds [100 x i8]* getelementptr inbounds 
(%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
  %0 = load i8* %arrayidx, align 1, !tbaa !1
  %conv = zext i8 %0 to i32
  %add = add nsw i32 %conv, 1
  %conv1 = trunc i32 %add to i8
  store i8 %conv1, i8* %arrayidx, align 1, !tbaa !1   <====== !1 and !6 are 
aliased
  %1 = load i32** @p, align 8, !tbaa !4
  %2 = load i32* %1, align 4, !tbaa !6  <=====
  %conv2 = trunc i32 %2 to i8
  %conv3 = zext i8 %conv2 to i32
  %3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, 
!tbaa !8
  %conv4 = zext i8 %3 to i32
  %add5 = add nsw i32 %conv4, %conv3
  %conv6 = trunc i32 %add5 to i8
  store i8 %conv6, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), 
align 1, !tbaa !8
  %inc = add nsw i32 %i.0, 1
  br label %for.cond


With the change, we get the following, where !7 and !1 are not aliased with 
each other since !1 has the base type "char".

!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"char", metadata !3, i64 0}
!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
!4 = metadata !{metadata !"Simple C/C++ TBAA"}
!5 = metadata !{metadata !6, metadata !6, i64 0}
!6 = metadata !{metadata !"any pointer", metadata !3, i64 0}
!7 = metadata !{metadata !8, metadata !8, i64 0}
!8 = metadata !{metadata !"int", metadata !3, i64 0}
!9 = metadata !{metadata !10, metadata !3, i64 0}
!10 = metadata !{metadata !"", metadata !3, i64 0, metadata !3, i64 1, metadata 
!3, i64 101}

for.body:                                         ; preds = %for.cond
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds [100 x i8]* getelementptr inbounds 
(%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
  %0 = load i8* %arrayidx, align 1, !tbaa !1
  %conv = zext i8 %0 to i32
  %add = add nsw i32 %conv, 1
  %conv1 = trunc i32 %add to i8
  store i8 %conv1, i8* %arrayidx, align 1, !tbaa !1  <=====  !1 and !7 are not 
aliased anymore
  %1 = load i32** @p, align 8, !tbaa !5
  %2 = load i32* %1, align 4, !tbaa !7  <=====
  %conv2 = trunc i32 %2 to i8
  %conv3 = zext i8 %conv2 to i32
  %3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1, 
!tbaa !9
  %conv4 = zext i8 %3 to i32
  %add5 = add nsw i32 %conv4, %conv3
  %conv6 = trunc i32 %add5 to i8
  store i8 %conv6, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), 
align 1, !tbaa !9
  %inc = add nsw i32 %i.0, 1
  br label %for.cond

The "omnipotent char" as the base is preserved when dereferencing pointers to 
char, etc.  The only thing that changes is the underlying base type of the TBAA 
tag associated with accesses to elements of non-VLAs of type char.  When the 
base type on the tbaa tag is changed to "char", it should ensure that these 
accesses are not aliased with any pointers other than pointers to char.

Thanks,
Sanjin

-----Original Message-----
From: Arthur O'Dwyer [mailto:[email protected]] 
Sent: Wednesday, June 25, 2014 7:35 PM
To: Sanjin Sijaric
Cc: cfe-commits
Subject: Re: [PATCH] More precise aliasing for char arrays

On Wed, Jun 25, 2014 at 3:26 PM, Sanjin Sijaric <[email protected]> wrote:
>
>>     int *p;
>>     typedef struct {
>>       char a;
>>       char b[100];
>>       char c;
>>     } S;
>>
>>     S x;
>>
>>     void func1 (char d) {
>>       for (int i = 0; i < 100; i++) {
>>         x.b[i] += 1;
>>         d = *p;
>>         x.a += d;
>>       }
>>     }
>>
>> It seems like you want the compiler to hoist the read of `*p` above the 
>> write to `x.b[i]`.
>> But that isn't generally possible, is it? because the caller might have 
>> executed
>>
>>    p = &x.b[3];
>>
>> before the call to func1.
>
> Here, "p" is a pointer to int, whereas b is a char array.  Wouldn't "p = 
> &x.b[3];" break ansi aliasing rules?

I was under the impression that that was the entire point of "omnipotent char"!
See lines 106-108 in the very file you're changing:

00106     // Character types are special and can alias anything.
00107     // In C++, this technically only includes "char" and "unsigned char",
00108     // and not "signed char". In C, it includes all three. For now,
00109     // the risk of exploiting this detail in C++ seems likely to outweigh
00110     // the benefit.

Source: http://clang.llvm.org/doxygen/CodeGenTBAA_8cpp_source.html

–Arthur


_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to