Hi Arthur,
Thanks for looking at this.
The way I understood that comment was to mean that pointers to char can alias
with anything.
Section 3.10 of the C++11 standard says:
"If a program attempts to access the stored value of an object through a
glvalue of other than one of the following types the behavior is undefined:
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type
of the object,
— a type that is the signed or unsigned type corresponding to a cv-qualified
version of the dynamic type
of the object,
— an aggregate or union type that includes one of the aforementioned types
among its elements or nonstatic
data members (including, recursively, an element or non-static data member of a
subaggregate
or contained union),
— a type that is a (possibly cv-qualified) base class type of the dynamic type
of the object,
— a char or unsigned char type."
(I don't have the C Standard handy at the moment)
The way I am reading this is that "char *p; p = ...; d = *p;", where d is an
int, will result in "d = *p" not getting hoisted out, since *p is an lvalue of
type char. That is defined behaviour. On the other hand, for "int *p; p = &c;
d = *p;" where c is a char, "d = *p" is free to get hoisted out since *p is an
lvalue of type int, which isn't the dynamic type of the object (char), or any
of the other classifications from the list. Am I misinterpreting the standard
here?
If not, then having "char *p;" and "b" as an int array should produce no change
in the aliasing information. With this change, aliasing is identical for the
following compared to what is currently generated.
char *p;
typedef struct {
char a;
int b[100];
char c;
} S;
S x;
void func1 (char d) {
for (int i = 0; i < 100; i++) {
x.b[i] += 1;
d = *p; <==== will not get hoisted
x.a += d;
}
}
I get the following (for aarch64), where "!tbaa !7" that is associated with "d
= *p" has the base of "omnipotent char", which will prevent it from getting
hoisted. For example, TBAA will determine that !7 is aliased with !1 when it's
doing analysis on "store i32 %add, i32* %arrayidx, align 4, !tbaa !1" and "%2 =
load i8* %1, align 1, !tbaa !7"
!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"int", metadata !3, i64 0}
!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
!4 = metadata !{metadata !"Simple C/C++ TBAA"}
!5 = metadata !{metadata !6, metadata !6, i64 0}
!6 = metadata !{metadata !"any pointer", metadata !3, i64 0}
!7 = metadata !{metadata !3, metadata !3, i64 0}
!8 = metadata !{metadata !9, metadata !3, i64 0}
!9 = metadata !{metadata !"", metadata !3, i64 0, metadata !3, i64 4, metadata
!3, i64 404}
for.body: ; preds = %for.cond
%idxprom = sext i32 %i.0 to i64
%arrayidx = getelementptr inbounds [100 x i32]* getelementptr inbounds
(%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
%0 = load i32* %arrayidx, align 4, !tbaa !1
%add = add nsw i32 %0, 1
store i32 %add, i32* %arrayidx, align 4, !tbaa !1 <============ !1 and !7
are aliased
%1 = load i8** @p, align 8, !tbaa !5
%2 = load i8* %1, align 1, !tbaa !7 <==========
%conv = zext i8 %2 to i32
%3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1,
!tbaa !8
%conv1 = zext i8 %3 to i32
%add2 = add nsw i32 %conv1, %conv
%conv3 = trunc i32 %add2 to i8
store i8 %conv3, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0),
align 1, !tbaa !8
%inc = add nsw i32 %i.0, 1
br label %for.cond
This change is in CGExpr.cpp in EmitArraySubscriptExpr when dealing with
accesses to non-VLAs, so that the load/store that is produced is annotated with
a tbaa tag that doesn't have "omnipotent char" as the base.
If we go back to the original case with "int *p;" and b as the char array,
today we get the following where "store i8 %conv1, i8* %arrayidx, align 1,
!tbaa !1" and "%2 = load i32* %1, align 4, !tbaa !6 " are aliased with each
other due to !tbaa !1 having the base type as "omnipotent char". So, "x.b[i] =
..." is preventing "d = *p" from getting hoisted out. This is undefined
behaviour if I am reading the standard correctly. If that is the case, I think
we should be able to hoist "d = *p" out of the loop.
!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
!3 = metadata !{metadata !"Simple C/C++ TBAA"}
!4 = metadata !{metadata !5, metadata !5, i64 0}
!5 = metadata !{metadata !"any pointer", metadata !2, i64 0}
!6 = metadata !{metadata !7, metadata !7, i64 0}
!7 = metadata !{metadata !"int", metadata !2, i64 0}
!8 = metadata !{metadata !9, metadata !2, i64 0}
!9 = metadata !{metadata !"", metadata !2, i64 0, metadata !2, i64 1, metadata
!2, i64 101}
for.body: ; preds = %for.cond
%idxprom = sext i32 %i.0 to i64
%arrayidx = getelementptr inbounds [100 x i8]* getelementptr inbounds
(%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
%0 = load i8* %arrayidx, align 1, !tbaa !1
%conv = zext i8 %0 to i32
%add = add nsw i32 %conv, 1
%conv1 = trunc i32 %add to i8
store i8 %conv1, i8* %arrayidx, align 1, !tbaa !1 <====== !1 and !6 are
aliased
%1 = load i32** @p, align 8, !tbaa !4
%2 = load i32* %1, align 4, !tbaa !6 <=====
%conv2 = trunc i32 %2 to i8
%conv3 = zext i8 %conv2 to i32
%3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1,
!tbaa !8
%conv4 = zext i8 %3 to i32
%add5 = add nsw i32 %conv4, %conv3
%conv6 = trunc i32 %add5 to i8
store i8 %conv6, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0),
align 1, !tbaa !8
%inc = add nsw i32 %i.0, 1
br label %for.cond
With the change, we get the following, where !7 and !1 are not aliased with
each other since !1 has the base type "char".
!0 = metadata !{metadata !"clang version 3.5.0 "}
!1 = metadata !{metadata !2, metadata !2, i64 0}
!2 = metadata !{metadata !"char", metadata !3, i64 0}
!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
!4 = metadata !{metadata !"Simple C/C++ TBAA"}
!5 = metadata !{metadata !6, metadata !6, i64 0}
!6 = metadata !{metadata !"any pointer", metadata !3, i64 0}
!7 = metadata !{metadata !8, metadata !8, i64 0}
!8 = metadata !{metadata !"int", metadata !3, i64 0}
!9 = metadata !{metadata !10, metadata !3, i64 0}
!10 = metadata !{metadata !"", metadata !3, i64 0, metadata !3, i64 1, metadata
!3, i64 101}
for.body: ; preds = %for.cond
%idxprom = sext i32 %i.0 to i64
%arrayidx = getelementptr inbounds [100 x i8]* getelementptr inbounds
(%struct.S* @x, i32 0, i32 1), i32 0, i64 %idxprom
%0 = load i8* %arrayidx, align 1, !tbaa !1
%conv = zext i8 %0 to i32
%add = add nsw i32 %conv, 1
%conv1 = trunc i32 %add to i8
store i8 %conv1, i8* %arrayidx, align 1, !tbaa !1 <===== !1 and !7 are not
aliased anymore
%1 = load i32** @p, align 8, !tbaa !5
%2 = load i32* %1, align 4, !tbaa !7 <=====
%conv2 = trunc i32 %2 to i8
%conv3 = zext i8 %conv2 to i32
%3 = load i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0), align 1,
!tbaa !9
%conv4 = zext i8 %3 to i32
%add5 = add nsw i32 %conv4, %conv3
%conv6 = trunc i32 %add5 to i8
store i8 %conv6, i8* getelementptr inbounds (%struct.S* @x, i32 0, i32 0),
align 1, !tbaa !9
%inc = add nsw i32 %i.0, 1
br label %for.cond
The "omnipotent char" as the base is preserved when dereferencing pointers to
char, etc. The only thing that changes is the underlying base type of the TBAA
tag associated with accesses to elements of non-VLAs of type char. When the
base type on the tbaa tag is changed to "char", it should ensure that these
accesses are not aliased with any pointers other than pointers to char.
Thanks,
Sanjin
-----Original Message-----
From: Arthur O'Dwyer [mailto:[email protected]]
Sent: Wednesday, June 25, 2014 7:35 PM
To: Sanjin Sijaric
Cc: cfe-commits
Subject: Re: [PATCH] More precise aliasing for char arrays
On Wed, Jun 25, 2014 at 3:26 PM, Sanjin Sijaric <[email protected]> wrote:
>
>> int *p;
>> typedef struct {
>> char a;
>> char b[100];
>> char c;
>> } S;
>>
>> S x;
>>
>> void func1 (char d) {
>> for (int i = 0; i < 100; i++) {
>> x.b[i] += 1;
>> d = *p;
>> x.a += d;
>> }
>> }
>>
>> It seems like you want the compiler to hoist the read of `*p` above the
>> write to `x.b[i]`.
>> But that isn't generally possible, is it? because the caller might have
>> executed
>>
>> p = &x.b[3];
>>
>> before the call to func1.
>
> Here, "p" is a pointer to int, whereas b is a char array. Wouldn't "p =
> &x.b[3];" break ansi aliasing rules?
I was under the impression that that was the entire point of "omnipotent char"!
See lines 106-108 in the very file you're changing:
00106 // Character types are special and can alias anything.
00107 // In C++, this technically only includes "char" and "unsigned char",
00108 // and not "signed char". In C, it includes all three. For now,
00109 // the risk of exploiting this detail in C++ seems likely to outweigh
00110 // the benefit.
Source: http://clang.llvm.org/doxygen/CodeGenTBAA_8cpp_source.html
–Arthur
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits