Re: [cfe-commits] [llvm-commits] [PATCH] __builtin_assume_aligned for Clang and LLVM

David Blaikie Sat, 01 Dec 2012 09:13:42 -0800

On Fri, Nov 30, 2012 at 6:44 PM, Alex Rosenberg <[email protected]> wrote:
> I'd love a more general assume mechanism that other optimizations could use. 
> e.g. alignment would simply be an available (x & mask) expression for the 
> suitable passes to take advantage of.


http://llvm.org/PR810 for some history/context/support for this kind of thing

>
> Sent from my iPad
>
> On Nov 30, 2012, at 4:14 PM, Hal Finkel <[email protected]> wrote:
>
>> Hi everyone,
>>
>> Many compilers provide a way, through either pragmas or intrinsics, for the 
>> user to assert stronger alignment requirements on a pointee than is 
>> otherwise implied by the pointer's type. gcc now provides an intrinsic for 
>> this purpose, __builtin_assume_aligned, and the attached patches (one for 
>> Clang and one for LLVM) implement that intrinsic using a corresponding LLVM 
>> intrinsic, and provide an infrastructure to take advantage of this new 
>> information.
>>
>> ** BEGIN justification -- skip this if you don't care ;) **
>> First, let me provide some justification. It is currently possible in Clang, 
>> using gcc-style (or C++11-style) attributes, to create typedefs with 
>> stronger alignment requirements than the original type. This is a useful 
>> feature, but it has shortcomings. First, for the purpose of allowing the 
>> compiler to create vectorized code with aligned loads and stores, they are 
>> awkward to use, and even more-awkward to use correctly. For example, if I 
>> have as a base case:
>> foo (double *a, double *b) {
>>  for (int i = 0; i < N; ++i)
>>    a[i] = b[i] + 1;
>> }
>> and I want to say that a and b are both 16-byte aligned, I can write instead:
>> typedef double __attribute__((aligned(16))) double16;
>> foo (double16 *a, double16 *b) {
>>  for (int i = 0; i < N; ++i)
>>    a[i] = b[i] + 1;
>> }
>> and this might work; the loads and stores will be tagged as 16-byte aligned, 
>> and we can vectorize the loop into, for example, a loop over <2 x double>. 
>> The problem is that the code is now incorrect: it implies that *all* of the 
>> loads and stores are 16-byte aligned, and this is not true. Only every-other 
>> one is 16-byte aligned. It is possible to correct this problem by manually 
>> unrolling the loop by a factor of 2:
>> foo (double16 *a, double16 *b) {
>>  for (int i = 0; i < N; i += 2) {
>>    a[i] = b[i] + 1;
>>    ((double *) a)[i+1] = ((double *) b)[i+1] + 1;
>>  }
>> }
>> but this is awkward and error-prone.
>>
>> With the intrinsic, this is easier:
>> foo (double *a, double *b) {
>>  a = __builtin_assume_aligned(a, 16);
>>  b = __builtin_assume_aligned(b, 16);
>>  for (int i = 0; i < N; ++i)
>>    a[i] = b[i] + 1;
>> }
>> this code can be vectorized with aligned loads and stores, and even if it is 
>> not vectorized, will remain correct.
>>
>> The second problem with the purely type-based approach, is that it requires 
>> manual loop unrolling an inlining. Because the intrinsics are evaluated 
>> after inlining (and after loop unrolling), the optimizer can use the 
>> alignment assumptions specified in the caller when generating code for an 
>> inlined callee. This is a very important capability.
>>
>> The need to apply the alignment assumptions after inlining and loop 
>> unrolling necessitate placing most of the infrastructure for this into LLVM; 
>> with Clang only generating LLVM intrinsics. In addition, to take full 
>> advantage of the information provided, it is necessary to look at 
>> loop-dependent pointer offsets and strides; ScalarEvoltution provides the 
>> appropriate framework for doing this.
>> ** END justification **
>>
>> Mirroring the gcc (and now Clang) intrinsic, the corresponding LLVM 
>> intrinsic is:
>> <t1>* @llvm.assume.aligned.p<s><t1>.<t2>(<t1>* addr, i32 alignment, <int t2> 
>> offset)
>> which asserts that the address returned is offset bytes above an address 
>> with the specified alignment. The attached patch makes some simple changes 
>> to several analysis passes (like BasicAA and SE) to allow them to 'look 
>> through' the intrinsic. It also adds a transformation pass that propagates 
>> the alignment assumptions to loads and stores directly dependent on the 
>> intrinsics's return value. Once this is done, the intrinsics are removed so 
>> that they don't interfere with the remaining optimizations.
>>
>> The patches are attached. I've also uploaded these to llvm-reviews (this is 
>> my first time trying this, so please let me know if I should do something 
>> differently):
>> Clang - http://llvm-reviews.chandlerc.com/D149
>> LLVM - http://llvm-reviews.chandlerc.com/D150
>>
>> Please review.
>>
>> Nadav, One shortcoming of the current patch is that, while it will work to 
>> vectorize loops using unroll+bb-vectorize, it will not automatically work 
>> with the loop vectorizer. To really be effective, the transformation pass 
>> needs to run after loop unrolling; and loop unrolling is (and should be) run 
>> after loop vectorization. Even if run prior to loop vectorization, it would 
>> not directly help the loop vectorizer because the necessary strided loads 
>> and stores don't yet exist. As a second step, I think we should split the 
>> current transformation pass into a transformation pass and an analysis pass. 
>> This analysis pass can then be used by the loop vectorizer (and any other 
>> early passes that want the information) before the final rewriting and 
>> intrinsic deletion is done.
>>
>> Thanks again,
>> Hal
>>
>> --
>> Hal Finkel
>> Postdoctoral Appointee
>> Leadership Computing Facility
>> Argonne National Laboratory
>> <asal-clang-20121130.patch>
>> <asal-llvm-20121130.patch>
>> _______________________________________________
>> llvm-commits mailing list
>> [email protected]
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> [email protected]
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [llvm-commits] [PATCH] __builtin_assume_aligned for Clang and LLVM

Reply via email to