On Fri, 27 Jul 2012 10:42:07 -0700 Tanya Lattner <[email protected]> wrote:
> > On Jul 27, 2012, at 2:41 AM, Hal Finkel wrote: > > > On Mon, 23 Jul 2012 13:14:07 -0700 > > Tanya Lattner <[email protected]> wrote: > > > >> > >> On Jul 18, 2012, at 6:51 PM, John McCall wrote: > >> > >>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote: > >>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote: > >>>>> Hi Tanya, > >>>>> Looks good and usefull, but I'm not sure if it should be clang's > >>>>> decision if storing and loading vec4s is better than vec3. > >>>> > >>>> The idea was to have Clang generate code that the optimizers > >>>> would be more likely to do something useful and smart with. I > >>>> understand the concern, but I'm not sure where the best place > >>>> for this would be then? > >>> > >>> Hmm. The IR size of a <3 x blah> is basically the size of a <4 x > >>> blah> anyway; arguably the backend already has all the > >>> blah> information it needs for this. Dan, what do you think? > >>> > >>> One objection to doing this in the frontend is that it's not clear > >>> to me that this is a transformation we should be doing if <4 x > >>> blah> isn't actually legal for the target. But I'm amenable to > >>> blah> the idea that this belongs here. > >>> > >> > >> I do not think its Clangs job to care about this as we already have > >> this problem for other vector sizes and its target lowering's job > >> to fix it. > >> > >>> I'm also a little uncomfortable with this patch because it's so > >>> special-cased to 3. I understand that that might be all that > >>> OpenCL really cares about, but it seems silly to add this code > >>> that doesn't also kick in for, say, <7 x i16> or whatever. It > >>> really shouldn't be difficult to generalize. > >> > >> While it could be generalized, I am only 100% confident in the > >> codegen for vec3 as I know for sure that it improves the code > >> quality that is ultimately generated. This is also throughly > >> tested by our OpenCL compiler so I am confident we are not > >> breaking anything and we are improving performance. > > > > On the request of several people, I recently enhanced the BB > > vectorizer to produce odd-sized vector types (when possible). This > > was for two reasons: > > > > 1. Some targets actually have instructions for length-3 vectors > > (mostly for doing things on x,y,z triples), and they wanted > > autovectorization support for these. > > > > 2. This is generally a win elsewhere as well because the odd-length > > vectors will be promoted to even-length vectors (which is good > > compares to leaving scalar code). > > > > In this context, I am curious to know how generating length-4 > > vectors in the frontend gives better performance. Is this something > > that the BB vectorizer (or any other vectorizer) should be doing as > > well? > > > > While I think you are doing great work with your vectorizer, its not > something we can rely on yet to get back this performance since its > not on by default. Thanks! I did not mean to imply that the vectorizer was, or should be, some kind of substitute for this optimization. I wanted to know whether I should be enhancing the vectorizer to do this same kind of "rounding up" of the vectors it produces. > > A couple more comments about my patch. First, vec3 is an OpenCL type > that is defined in the spec that says it should have the same size > and alignment as vec4. So generating this code pattern for > load/stores makes sense in that context. I can only enable this if > its OpenCL, but I think anyone who uses vec3 would want this > performance win. Interesting thanks! I was just wondering why there is a performance win at all. I assume that vectors of length 3 are promoted to length-4 vectors during legalization currently. Why should declaring them to be length 4 instead of length 3 affect the code quality at all? > > Secondly, I'm not arguing that this is the ultimate fix, but it is a > valid, simple, and easy to remove once other areas of the compiler > are improved to handle this case. > > After discussing with others, the proper fix might be something like > this: 1. Enhance target data to encode the "native" vector types for > the target, along the same lines as the "native" integer types that > it already can do. In the medium term, we need this and a lot more (costs for shuffles, aligned vs. unaligned loads, etc.). We should certainly have a discussion about how to best do this (probably in a different thread). > 2. Enhance the IR optimizer to force vectors to > those types when it can get away with it. > > This optimization needs to be early enough that other mid-level > optimizations can take advantage of it. > > I'd still like to check this in as there is not an alternative > solution that exists right now. Because there is some debate, I think > the code owner needs to have final say here (cc-ing Doug). Please don't count me as objecting. We might want, however, a flag to turn this off... it may be that at some point in the future the LLVM-level logic will be sufficient to render this unnecessary, and we'll probably want an easy way to run performance tests to find out when that happens. Thanks again, Hal > > Thanks, > Tanya > > > > > Thanks again, > > Hal > > > >> From my perspective, there is no vec7 in > >> OpenCL, so there is no strong need for it to be generalized. It > >> seems to make sense that it would also improve code quality for > >> vec7. So while I could make the change, I don't have real world > >> apps to test it. I think this can easily be left as an extension > >> for later. > >> > >>> > >>> Also, this optimization assumes that sizeof(element) is a power of > >>> 2, because the only thing that the AST guarantees is that the size > >>> of the vector type is a power of 2, and nextPow2(3 x E) == > >>> nextPow2(4 x E) is only true in general if E is a power of 2. Is > >>> that reasonable? Is that checked somewhere? > >>> > >> > >> I don't think this is officially checked anywhere but is a general > >> rule. > >> > >> -Tanya > >> > >> > >>> Two minor notes. > >>> > >>> + const llvm::VectorType *VTy = > >>> + dyn_cast<llvm::VectorType>(EltTy); > >>> > >>> How can this fail? If it can, that's interesting; if it can't, > >>> it would be better if the "is this a vector operation" checks > >>> used the IR type instead of the AST type. > >>> > >>> + llvm::PointerType *DstPtr = > >>> cast<llvm::PointerType>(Addr->getType()); > >>> + if (DstPtr->getElementType() != SrcTy) { > >>> > >>> This can't fail in the case we care about, so you can just merge > >>> this into the if-block above. > >>> > >>> John. > >> > >> _______________________________________________ > >> cfe-commits mailing list > >> [email protected] > >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits > > > > > > > > -- > > Hal Finkel > > Postdoctoral Appointee > > Leadership Computing Facility > > Argonne National Laboratory > -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
