On Tue, 24 Jul 2012 08:05:18 +0530 Tobias Grosser <[email protected]> wrote:
> On 07/24/2012 01:54 AM, Dan Gohman wrote: > > > > On Jul 23, 2012, at 11:34 AM, Tanya Lattner <[email protected]> > > wrote: > > > >> > >> On Jul 19, 2012, at 11:51 AM, Dan Gohman wrote: > >> > >>> > >>> On Jul 18, 2012, at 6:51 PM, John McCall <[email protected]> > >>> wrote: > >>> > >>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote: > >>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote: > >>>>>> Hi Tanya, > >>>>>> Looks good and usefull, but I'm not sure if it should be > >>>>>> clang's decision if storing and loading vec4s is better than > >>>>>> vec3. > >>>>> > >>>>> The idea was to have Clang generate code that the optimizers > >>>>> would be more likely to do something useful and smart with. I > >>>>> understand the concern, but I'm not sure where the best place > >>>>> for this would be then? > >>>> > >>>> Hmm. The IR size of a <3 x blah> is basically the size of a <4 > >>>> x blah> anyway; arguably the backend already has all the > >>>> information it needs for this. Dan, what do you think? > >>> > >>> I guess optimizer passes won't be extraordinarily happy about all > >>> this bitcasting and shuffling. It seems to me that we have a > >>> problem in that we're splitting up the high-level task of "lower > >>> <3 x blah> to <4 x blah>" and doing some of it in the front-end > >>> and some of it in the backend. Ideally, we should do it all in > >>> one place, for conceptual simplicity, and to avoid the > >>> awkwardness of having the optimizer run in a world that's half > >>> one way and half the other, with awkward little bridges between > >>> the two halves. > >> > >> I think its hard to speculate that the optimizer passes are not > >> happy about the bit cast and shuffling. I'm running with > >> optimizations on and the code is still much better than not having > >> Clang do this "optimization" for vec3. > > > > Sorry for being unclear; I was speculating more about future > > optimization passes. I don't doubt your patch achieves its purpose > > today. > > > >> I strongly feel that Clang can make the decision to output code > >> like this if it leads to better code in the end. > > > > Ok. What do you think about having clang doing all of the lowering > > of <3 x blah> to <4 x blah> then? I mean all of the aritihmetic, > > function arguments and return values, and so on? In other words, is > > there something special about loads and stores of vec3, or are they > > just one symptom of a broader vec3 problem? > > > > Of course, I'm not asking you do this work right now; I'm asking > > whether this would be a better overall design. > > Having clang perform this transformation will also reduce the amount > of optimizations a bb vectorizer could possibly do. I could see e.g. > that a loop unrolled by 4 may be transformed from 4 * <vec3> to 3 * > <vec4>. This does not work today and will probably not work soon, but > we should keep it in mind. The (trunk) vectorizer should do this now: the vec3s could be combined into vec12s, which should then be legalized to vec4s. -Hal > > (This does not mean I am against the patch, /i just wanted to point > this out) > > Tobi > _______________________________________________ > cfe-commits mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
