What happens if someone uses the intrinsic to *weaken* the alignment of the
given type? i.e.,
double *d;
d = __builtin_assume_aligned(d, 1);
C99 6.3.2.3p7 and C++11 [expr.reinterpret.cast]p17 say it is undefined (an
issue with which I am all too familiar with these days). Should the compiler
reject the code or allow it? I note that the compiler allows
__attribute__((aligned(1))), so this should probably also be allowed but anyone
using it must proceed with caution.
Schwieb
On Nov 30, 2012, at 6:44 PM, Alex Rosenberg <[email protected]> wrote:
> I'd love a more general assume mechanism that other optimizations could use.
> e.g. alignment would simply be an available (x & mask) expression for the
> suitable passes to take advantage of.
>
> Sent from my iPad
>
> On Nov 30, 2012, at 4:14 PM, Hal Finkel <[email protected]> wrote:
>
>> Hi everyone,
>>
>> Many compilers provide a way, through either pragmas or intrinsics, for the
>> user to assert stronger alignment requirements on a pointee than is
>> otherwise implied by the pointer's type. gcc now provides an intrinsic for
>> this purpose, __builtin_assume_aligned, and the attached patches (one for
>> Clang and one for LLVM) implement that intrinsic using a corresponding LLVM
>> intrinsic, and provide an infrastructure to take advantage of this new
>> information.
>>
>> ** BEGIN justification -- skip this if you don't care ;) **
>> First, let me provide some justification. It is currently possible in Clang,
>> using gcc-style (or C++11-style) attributes, to create typedefs with
>> stronger alignment requirements than the original type. This is a useful
>> feature, but it has shortcomings. First, for the purpose of allowing the
>> compiler to create vectorized code with aligned loads and stores, they are
>> awkward to use, and even more-awkward to use correctly. For example, if I
>> have as a base case:
>> foo (double *a, double *b) {
>> for (int i = 0; i < N; ++i)
>> a[i] = b[i] + 1;
>> }
>> and I want to say that a and b are both 16-byte aligned, I can write instead:
>> typedef double __attribute__((aligned(16))) double16;
>> foo (double16 *a, double16 *b) {
>> for (int i = 0; i < N; ++i)
>> a[i] = b[i] + 1;
>> }
>> and this might work; the loads and stores will be tagged as 16-byte aligned,
>> and we can vectorize the loop into, for example, a loop over <2 x double>.
>> The problem is that the code is now incorrect: it implies that *all* of the
>> loads and stores are 16-byte aligned, and this is not true. Only every-other
>> one is 16-byte aligned. It is possible to correct this problem by manually
>> unrolling the loop by a factor of 2:
>> foo (double16 *a, double16 *b) {
>> for (int i = 0; i < N; i += 2) {
>> a[i] = b[i] + 1;
>> ((double *) a)[i+1] = ((double *) b)[i+1] + 1;
>> }
>> }
>> but this is awkward and error-prone.
>>
>> With the intrinsic, this is easier:
>> foo (double *a, double *b) {
>> a = __builtin_assume_aligned(a, 16);
>> b = __builtin_assume_aligned(b, 16);
>> for (int i = 0; i < N; ++i)
>> a[i] = b[i] + 1;
>> }
>> this code can be vectorized with aligned loads and stores, and even if it is
>> not vectorized, will remain correct.
>>
>> The second problem with the purely type-based approach, is that it requires
>> manual loop unrolling an inlining. Because the intrinsics are evaluated
>> after inlining (and after loop unrolling), the optimizer can use the
>> alignment assumptions specified in the caller when generating code for an
>> inlined callee. This is a very important capability.
>>
>> The need to apply the alignment assumptions after inlining and loop
>> unrolling necessitate placing most of the infrastructure for this into LLVM;
>> with Clang only generating LLVM intrinsics. In addition, to take full
>> advantage of the information provided, it is necessary to look at
>> loop-dependent pointer offsets and strides; ScalarEvoltution provides the
>> appropriate framework for doing this.
>> ** END justification **
>>
>> Mirroring the gcc (and now Clang) intrinsic, the corresponding LLVM
>> intrinsic is:
>> <t1>* @llvm.assume.aligned.p<s><t1>.<t2>(<t1>* addr, i32 alignment, <int t2>
>> offset)
>> which asserts that the address returned is offset bytes above an address
>> with the specified alignment. The attached patch makes some simple changes
>> to several analysis passes (like BasicAA and SE) to allow them to 'look
>> through' the intrinsic. It also adds a transformation pass that propagates
>> the alignment assumptions to loads and stores directly dependent on the
>> intrinsics's return value. Once this is done, the intrinsics are removed so
>> that they don't interfere with the remaining optimizations.
>>
>> The patches are attached. I've also uploaded these to llvm-reviews (this is
>> my first time trying this, so please let me know if I should do something
>> differently):
>> Clang - http://llvm-reviews.chandlerc.com/D149
>> LLVM - http://llvm-reviews.chandlerc.com/D150
>>
>> Please review.
>>
>> Nadav, One shortcoming of the current patch is that, while it will work to
>> vectorize loops using unroll+bb-vectorize, it will not automatically work
>> with the loop vectorizer. To really be effective, the transformation pass
>> needs to run after loop unrolling; and loop unrolling is (and should be) run
>> after loop vectorization. Even if run prior to loop vectorization, it would
>> not directly help the loop vectorizer because the necessary strided loads
>> and stores don't yet exist. As a second step, I think we should split the
>> current transformation pass into a transformation pass and an analysis pass.
>> This analysis pass can then be used by the loop vectorizer (and any other
>> early passes that want the information) before the final rewriting and
>> intrinsic deletion is done.
>>
>> Thanks again,
>> Hal
>>
>> --
>> Hal Finkel
>> Postdoctoral Appointee
>> Leadership Computing Facility
>> Argonne National Laboratory
>> <asal-clang-20121130.patch>
>> <asal-llvm-20121130.patch>
>> _______________________________________________
>> llvm-commits mailing list
>> [email protected]
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> cfe-commits mailing list
> [email protected]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
>
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits