On Tue, 31 Jul 2012 12:59:33 -0700 Doug Gregor <[email protected]> wrote:
> On Fri, Jul 27, 2012 at 10:42 AM, Tanya Lattner <[email protected]> > wrote: > > > > On Jul 27, 2012, at 2:41 AM, Hal Finkel wrote: > > > >> On Mon, 23 Jul 2012 13:14:07 -0700 > >> Tanya Lattner <[email protected]> wrote: > >> > >>> > >>> On Jul 18, 2012, at 6:51 PM, John McCall wrote: > >>> > >>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote: > >>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote: > >>>>>> Hi Tanya, > >>>>>> Looks good and usefull, but I'm not sure if it should be > >>>>>> clang's decision if storing and loading vec4s is better than > >>>>>> vec3. > >>>>> > >>>>> The idea was to have Clang generate code that the optimizers > >>>>> would be more likely to do something useful and smart with. I > >>>>> understand the concern, but I'm not sure where the best place > >>>>> for this would be then? > >>>> > >>>> Hmm. The IR size of a <3 x blah> is basically the size of a <4 x > >>>> blah> anyway; arguably the backend already has all the > >>>> blah> information it needs for this. Dan, what do you think? > >>>> > >>>> One objection to doing this in the frontend is that it's not > >>>> clear to me that this is a transformation we should be doing if > >>>> <4 x > >>>> blah> isn't actually legal for the target. But I'm amenable to > >>>> blah> the idea that this belongs here. > >>>> > >>> > >>> I do not think its Clangs job to care about this as we already > >>> have this problem for other vector sizes and its target > >>> lowering's job to fix it. > >>> > >>>> I'm also a little uncomfortable with this patch because it's so > >>>> special-cased to 3. I understand that that might be all that > >>>> OpenCL really cares about, but it seems silly to add this code > >>>> that doesn't also kick in for, say, <7 x i16> or whatever. It > >>>> really shouldn't be difficult to generalize. > >>> > >>> While it could be generalized, I am only 100% confident in the > >>> codegen for vec3 as I know for sure that it improves the code > >>> quality that is ultimately generated. This is also throughly > >>> tested by our OpenCL compiler so I am confident we are not > >>> breaking anything and we are improving performance. > >> > >> On the request of several people, I recently enhanced the BB > >> vectorizer to produce odd-sized vector types (when possible). This > >> was for two reasons: > >> > >> 1. Some targets actually have instructions for length-3 vectors > >> (mostly for doing things on x,y,z triples), and they wanted > >> autovectorization support for these. > >> > >> 2. This is generally a win elsewhere as well because the odd-length > >> vectors will be promoted to even-length vectors (which is good > >> compares to leaving scalar code). > >> > >> In this context, I am curious to know how generating length-4 > >> vectors in the frontend gives better performance. Is this > >> something that the BB vectorizer (or any other vectorizer) should > >> be doing as well? > >> > > > > While I think you are doing great work with your vectorizer, its > > not something we can rely on yet to get back this performance since > > its not on by default. > > > > A couple more comments about my patch. First, vec3 is an OpenCL > > type that is defined in the spec that says it should have the same > > size and alignment as vec4. So generating this code pattern for > > load/stores makes sense in that context. I can only enable this if > > its OpenCL, but I think anyone who uses vec3 would want this > > performance win. > > > > Secondly, I'm not arguing that this is the ultimate fix, but it is > > a valid, simple, and easy to remove once other areas of the > > compiler are improved to handle this case. > > > > After discussing with others, the proper fix might be something > > like this: 1. Enhance target data to encode the "native" vector > > types for the target, along the same lines as the "native" integer > > types that it already can do. 2. Enhance the IR optimizer to force > > vectors to those types when it can get away with it. > > > > This optimization needs to be early enough that other mid-level > > optimizations can take advantage of it. > > > > I'd still like to check this in as there is not an alternative > > solution that exists right now. Because there is some debate, I > > think the code owner needs to have final say here (cc-ing Doug). > > I think it makes sense for this to go in. If our optimizers improve to > the point where they can handle vec3 directly as well as/better than > vec3-lowered-as-vec4, we can revisit this topic. Would it make sense to have a command-line switch to enable the old behavior so that the status of the backend optimizers can be easily checked? -Hal > > - Doug -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
