On Sat, Dec 1, 2012 at 12:51 AM, Chandler Carruth <[email protected]>wrote:
> On Fri, Nov 30, 2012 at 4:14 PM, Hal Finkel <[email protected]> wrote: > >> Hi everyone, >> >> Many compilers provide a way, through either pragmas or intrinsics, for >> the user to assert stronger alignment requirements on a pointee than is >> otherwise implied by the pointer's type. gcc now provides an intrinsic for >> this purpose, __builtin_assume_aligned, and the attached patches (one for >> Clang and one for LLVM) implement that intrinsic using a corresponding LLVM >> intrinsic, and provide an infrastructure to take advantage of this new >> information. >> >> ** BEGIN justification -- skip this if you don't care ;) ** >> > > I'd like to start off by saying, I think this is absolutely an important > use case. =] > > <snip the attribute / type discussion ... i agree here> > > With the intrinsic, this is easier: >> foo (double *a, double *b) { >> a = __builtin_assume_aligned(a, 16); >> b = __builtin_assume_aligned(b, 16); >> for (int i = 0; i < N; ++i) >> a[i] = b[i] + 1; >> } >> this code can be vectorized with aligned loads and stores, and even if it >> is not vectorized, will remain correct. >> >> The second problem with the purely type-based approach, is that it >> requires manual loop unrolling an inlining. Because the intrinsics are >> evaluated after inlining (and after loop unrolling), the optimizer can use >> the alignment assumptions specified in the caller when generating code for >> an inlined callee. This is a very important capability. > > > I think this is going to be a very important point for me to understand > fully. I see the problem with inlining, but not with unrolling. Can you > describe what the issue is there? > > <snip> > > Mirroring the gcc (and now Clang) intrinsic, the corresponding LLVM >> intrinsic is: >> <t1>* @llvm.assume.aligned.p<s><t1>.<t2>(<t1>* addr, i32 alignment, <int >> t2> offset) >> which asserts that the address returned is offset bytes above an address >> with the specified alignment. The attached patch makes some simple changes >> to several analysis passes (like BasicAA and SE) to allow them to 'look >> through' the intrinsic. It also adds a transformation pass that propagates >> the alignment assumptions to loads and stores directly dependent on the >> intrinsics's return value. Once this is done, the intrinsics are removed so >> that they don't interfere with the remaining optimizations. >> > > I think this approach has more problems than you're currently dealing > with. There are lots of passes that will run before loop unrolling and > which will need to look through pointers. I'm extremely uncomfortable > adding a *new* way to propagate a pointer from one SSA value to another, > there are soooo many places in the compiler we will have to fix. Your patch > already has a very large amount of duplicated code scattered about a large > collection of passes. I think this is an indication that this design isn't > ideal, so I'd really like to see if we can come up with a better in-IR > representation for this which will simplify how the rest of the optimizer > interacts with it. > > I see a few options off the top of my head, but I don't yet understand the > problem domain well enough to really dig into any of them. Specifically, I > understand the transforms you're trying to enable, and the reason they > aren't currently possible, but I don't understand the C/C++ code patterns > you are imagining, and where in those code patterns this invariant will be > introduced. If you can help with more examples maybe? > > Ok, on with the straw-man proposals: > > 1) Add support for declaring a pointer parameter's *value* to be aligned: > void foo([[clang::aligned_pointer(16)]] double *a, > [[clang::aligned_pointer(16)]] double *b) { > > for (int i = 0; i < N; ++i) > a[i] = b[i] + 1; > } > the idea here is that these alignment constraints would be substantially > different from either the GCC alignment attributes or the C++11 alignment > attributes. Note that these are actually very different, and Clang > currently gets the C++11 one wrong. > GCC alignment attribute: makes the *type* be aligned to the specified > amount. > C++11 alignment attribute: makes the *storage* be aligned to the specified > amount. (I think.. I'll check w/ Richard on Monday) > The C++11 attribute can be used in two ways (neither is what you want here): * On the declaration of a variable or data member, to overalign the storage of that entity * On the definition of a tag type, to overalign all objects of that type I suppose you could do this: template<typename T, size_t Align> struct alignas(Align) overaligned { T value; }; void foo(overaligned<double, 16> *a, overaligned<double, 16>) { // ... } ...but I'm not sure whether the alignment information would make it into the IR. Maybe this proposal could help with that? > Interestingly enough, C++11's definition would almost work for you if the > inliner flattens to the point at which storage is declared. My suggestion > is just to be able to also mark a pointer itself at an interface boundary > as carrying this constraint. It wouldn't change the type at all. You could > imagine this working much like GCC's non-null attribute does. > > The way my suggestion might be implemented in LLVM: an attribute or > metadata on the function parameters. Then we teach instcombine and loop > passes to detect when this constraint on alignment can be used to promote > the alignment of loads and stores. The loop passes can detect when > unrolling has allowed the stride to exceed the alignment, and then promote > the load and store alignment, etc. > > > 2) Use an invariant based design much as Alex suggested. The syntax for > this could be exactly the same as the GCC builtin you mention, or the same > as my syntax in #1. This representation is I think a strictly more > generalized LLVM IR model from the others under discussion. > > I have two completely off-the-cuff ideas about how to represent this. > Others who've been thinking about it longer may have better ideas: > a) Dead code to produce a CmpInst that the invariant asserts is true. > %ptrtoint = ptrtoint double* %ptr to i64 > %lowbits = and i64 %ptrtoint, 0xF > %invariant_cmp = icmp eq i64 %lowbits, 0 > call void @llvm.invariant(i1 %invariant_cmp) > > b) Dead code to produce two values which we assert are equivalent. Then > when trying to optimize one, we can leverage things known about the other > (such as simplify demanded bits, etc). > %ptrtoint = ptrtoint double* %ptr to i64 > %maskedint = and i64 %ptrtoint, 0xFFFFFFFFFFFFFFF0 > %maskedptr = inttoptr i64 %maskedint to double* %ptr > call void @llvm.equivalent(double* %ptr, double* %maskedptr) > > (I realize (b) here isn't really implementable w/ our current intrinsic > overloading, but you get the idea) > > The difference is -- how do optimizations use them? With (b), the > optimization can just query N equivalent values for any property. If it > holds for any, it holds for all. With (a), we have to specifically > interpret the predicate which is an invariant. While (a) seems like a bit > of extra work, I actually prefer it because it seems more general, easier > to understand, and to support fairly aggressive caching of invariants, etc. > > > 3) Your proposal, but handled more like __builtin_expect. Have a very > early pass that converts the intrinsic into metadata attached to a > particular pointer SSA value. Then teach passes to preserve the metadata, > and have your pass promote alignment based on it. > > > Of these, my preferences in order are #2, #3, and #1. I like #2 because it > would begin building a tremendously powerful and much needed piece of > infrastructure in LLVM, and serve many needs in addition to alignment. > > Thoughts? If you're interested in working on #2, I'd love to talk through > the design.... > > _______________________________________________ > llvm-commits mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
