On Tue, Jul 31, 2012 at 1:24 PM, Hal Finkel <[email protected]> wrote: > On Tue, 31 Jul 2012 12:59:33 -0700 > Doug Gregor <[email protected]> wrote: > >> On Fri, Jul 27, 2012 at 10:42 AM, Tanya Lattner <[email protected]> >> wrote: >> > >> > On Jul 27, 2012, at 2:41 AM, Hal Finkel wrote: >> > >> >> On Mon, 23 Jul 2012 13:14:07 -0700 >> >> Tanya Lattner <[email protected]> wrote: >> >> >> >>> >> >>> On Jul 18, 2012, at 6:51 PM, John McCall wrote: >> >>> >> >>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote: >> >>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote: >> >>>>>> Hi Tanya, >> >>>>>> Looks good and usefull, but I'm not sure if it should be >> >>>>>> clang's decision if storing and loading vec4s is better than >> >>>>>> vec3. >> >>>>> >> >>>>> The idea was to have Clang generate code that the optimizers >> >>>>> would be more likely to do something useful and smart with. I >> >>>>> understand the concern, but I'm not sure where the best place >> >>>>> for this would be then? >> >>>> >> >>>> Hmm. The IR size of a <3 x blah> is basically the size of a <4 x >> >>>> blah> anyway; arguably the backend already has all the >> >>>> blah> information it needs for this. Dan, what do you think? >> >>>> >> >>>> One objection to doing this in the frontend is that it's not >> >>>> clear to me that this is a transformation we should be doing if >> >>>> <4 x >> >>>> blah> isn't actually legal for the target. But I'm amenable to >> >>>> blah> the idea that this belongs here. >> >>>> >> >>> >> >>> I do not think its Clangs job to care about this as we already >> >>> have this problem for other vector sizes and its target >> >>> lowering's job to fix it. >> >>> >> >>>> I'm also a little uncomfortable with this patch because it's so >> >>>> special-cased to 3. I understand that that might be all that >> >>>> OpenCL really cares about, but it seems silly to add this code >> >>>> that doesn't also kick in for, say, <7 x i16> or whatever. It >> >>>> really shouldn't be difficult to generalize. >> >>> >> >>> While it could be generalized, I am only 100% confident in the >> >>> codegen for vec3 as I know for sure that it improves the code >> >>> quality that is ultimately generated. This is also throughly >> >>> tested by our OpenCL compiler so I am confident we are not >> >>> breaking anything and we are improving performance. >> >> >> >> On the request of several people, I recently enhanced the BB >> >> vectorizer to produce odd-sized vector types (when possible). This >> >> was for two reasons: >> >> >> >> 1. Some targets actually have instructions for length-3 vectors >> >> (mostly for doing things on x,y,z triples), and they wanted >> >> autovectorization support for these. >> >> >> >> 2. This is generally a win elsewhere as well because the odd-length >> >> vectors will be promoted to even-length vectors (which is good >> >> compares to leaving scalar code). >> >> >> >> In this context, I am curious to know how generating length-4 >> >> vectors in the frontend gives better performance. Is this >> >> something that the BB vectorizer (or any other vectorizer) should >> >> be doing as well? >> >> >> > >> > While I think you are doing great work with your vectorizer, its >> > not something we can rely on yet to get back this performance since >> > its not on by default. >> > >> > A couple more comments about my patch. First, vec3 is an OpenCL >> > type that is defined in the spec that says it should have the same >> > size and alignment as vec4. So generating this code pattern for >> > load/stores makes sense in that context. I can only enable this if >> > its OpenCL, but I think anyone who uses vec3 would want this >> > performance win. >> > >> > Secondly, I'm not arguing that this is the ultimate fix, but it is >> > a valid, simple, and easy to remove once other areas of the >> > compiler are improved to handle this case. >> > >> > After discussing with others, the proper fix might be something >> > like this: 1. Enhance target data to encode the "native" vector >> > types for the target, along the same lines as the "native" integer >> > types that it already can do. 2. Enhance the IR optimizer to force >> > vectors to those types when it can get away with it. >> > >> > This optimization needs to be early enough that other mid-level >> > optimizations can take advantage of it. >> > >> > I'd still like to check this in as there is not an alternative >> > solution that exists right now. Because there is some debate, I >> > think the code owner needs to have final say here (cc-ing Doug). >> >> I think it makes sense for this to go in. If our optimizers improve to >> the point where they can handle vec3 directly as well as/better than >> vec3-lowered-as-vec4, we can revisit this topic. > > Would it make sense to have a command-line switch to enable the old > behavior so that the status of the backend optimizers can be easily > checked?
At this point, doing that seems like extra work. If there's going to be a concerted effort throughout the optimizers to make such vectors work well, then it's worth adding that option as part of that concerted effort. - Doug _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
