Re: [cfe-commits] [llvm-commits] [PATCH] __builtin_assume_aligned for Clang and LLVM

Erik Schwiebert Fri, 30 Nov 2012 21:19:47 -0800

What happens if someone uses the intrinsic to *weaken* the alignment of the 
given type?  i.e.,


        double *d;
        d = __builtin_assume_aligned(d, 1);

C99 6.3.2.3p7 and C++11 [expr.reinterpret.cast]p17 say it is undefined (an 
issue with which I am all too familiar with these days).  Should the compiler 
reject the code or allow it?  I note that the compiler allows 
__attribute__((aligned(1))), so this should probably also be allowed but anyone 
using it must proceed with caution.

Schwieb

On Nov 30, 2012, at 6:44 PM, Alex Rosenberg <[email protected]> wrote:

> I'd love a more general assume mechanism that other optimizations could use. 
> e.g. alignment would simply be an available (x & mask) expression for the 
> suitable passes to take advantage of.
> 
> Sent from my iPad
> 
> On Nov 30, 2012, at 4:14 PM, Hal Finkel <[email protected]> wrote:
> 
>> Hi everyone,
>> 
>> Many compilers provide a way, through either pragmas or intrinsics, for the 
>> user to assert stronger alignment requirements on a pointee than is 
>> otherwise implied by the pointer's type. gcc now provides an intrinsic for 
>> this purpose, __builtin_assume_aligned, and the attached patches (one for 
>> Clang and one for LLVM) implement that intrinsic using a corresponding LLVM 
>> intrinsic, and provide an infrastructure to take advantage of this new 
>> information.
>> 
>> ** BEGIN justification -- skip this if you don't care ;) **
>> First, let me provide some justification. It is currently possible in Clang, 
>> using gcc-style (or C++11-style) attributes, to create typedefs with 
>> stronger alignment requirements than the original type. This is a useful 
>> feature, but it has shortcomings. First, for the purpose of allowing the 
>> compiler to create vectorized code with aligned loads and stores, they are 
>> awkward to use, and even more-awkward to use correctly. For example, if I 
>> have as a base case:
>> foo (double *a, double *b) {
>> for (int i = 0; i < N; ++i)
>>   a[i] = b[i] + 1;
>> }
>> and I want to say that a and b are both 16-byte aligned, I can write instead:
>> typedef double __attribute__((aligned(16))) double16;
>> foo (double16 *a, double16 *b) {
>> for (int i = 0; i < N; ++i)
>>   a[i] = b[i] + 1;
>> }
>> and this might work; the loads and stores will be tagged as 16-byte aligned, 
>> and we can vectorize the loop into, for example, a loop over <2 x double>. 
>> The problem is that the code is now incorrect: it implies that *all* of the 
>> loads and stores are 16-byte aligned, and this is not true. Only every-other 
>> one is 16-byte aligned. It is possible to correct this problem by manually 
>> unrolling the loop by a factor of 2:
>> foo (double16 *a, double16 *b) {
>> for (int i = 0; i < N; i += 2) {
>>   a[i] = b[i] + 1;
>>   ((double *) a)[i+1] = ((double *) b)[i+1] + 1;
>> }
>> }
>> but this is awkward and error-prone.
>> 
>> With the intrinsic, this is easier:
>> foo (double *a, double *b) {
>> a = __builtin_assume_aligned(a, 16);
>> b = __builtin_assume_aligned(b, 16);
>> for (int i = 0; i < N; ++i)
>>   a[i] = b[i] + 1;
>> }
>> this code can be vectorized with aligned loads and stores, and even if it is 
>> not vectorized, will remain correct.
>> 
>> The second problem with the purely type-based approach, is that it requires 
>> manual loop unrolling an inlining. Because the intrinsics are evaluated 
>> after inlining (and after loop unrolling), the optimizer can use the 
>> alignment assumptions specified in the caller when generating code for an 
>> inlined callee. This is a very important capability.
>> 
>> The need to apply the alignment assumptions after inlining and loop 
>> unrolling necessitate placing most of the infrastructure for this into LLVM; 
>> with Clang only generating LLVM intrinsics. In addition, to take full 
>> advantage of the information provided, it is necessary to look at 
>> loop-dependent pointer offsets and strides; ScalarEvoltution provides the 
>> appropriate framework for doing this.
>> ** END justification **
>> 
>> Mirroring the gcc (and now Clang) intrinsic, the corresponding LLVM 
>> intrinsic is:
>> <t1>* @llvm.assume.aligned.p<s><t1>.<t2>(<t1>* addr, i32 alignment, <int t2> 
>> offset)
>> which asserts that the address returned is offset bytes above an address 
>> with the specified alignment. The attached patch makes some simple changes 
>> to several analysis passes (like BasicAA and SE) to allow them to 'look 
>> through' the intrinsic. It also adds a transformation pass that propagates 
>> the alignment assumptions to loads and stores directly dependent on the 
>> intrinsics's return value. Once this is done, the intrinsics are removed so 
>> that they don't interfere with the remaining optimizations.
>> 
>> The patches are attached. I've also uploaded these to llvm-reviews (this is 
>> my first time trying this, so please let me know if I should do something 
>> differently):
>> Clang - http://llvm-reviews.chandlerc.com/D149
>> LLVM - http://llvm-reviews.chandlerc.com/D150
>> 
>> Please review.
>> 
>> Nadav, One shortcoming of the current patch is that, while it will work to 
>> vectorize loops using unroll+bb-vectorize, it will not automatically work 
>> with the loop vectorizer. To really be effective, the transformation pass 
>> needs to run after loop unrolling; and loop unrolling is (and should be) run 
>> after loop vectorization. Even if run prior to loop vectorization, it would 
>> not directly help the loop vectorizer because the necessary strided loads 
>> and stores don't yet exist. As a second step, I think we should split the 
>> current transformation pass into a transformation pass and an analysis pass. 
>> This analysis pass can then be used by the loop vectorizer (and any other 
>> early passes that want the information) before the final rewriting and 
>> intrinsic deletion is done. 
>> 
>> Thanks again,
>> Hal
>> 
>> -- 
>> Hal Finkel
>> Postdoctoral Appointee
>> Leadership Computing Facility
>> Argonne National Laboratory
>> <asal-clang-20121130.patch>
>> <asal-llvm-20121130.patch>
>> _______________________________________________
>> llvm-commits mailing list
>> [email protected]
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> cfe-commits mailing list
> [email protected]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> 
> 




_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [llvm-commits] [PATCH] __builtin_assume_aligned for Clang and LLVM

Reply via email to