Re: [cfe-commits] [PATCH] Optimize vec3 loads/stores

Benyei, Guy Tue, 24 Jul 2012 23:43:42 -0700

Hi Tanya,
Since in your patch the fourth element is always undef, I guess the three 
elements vectors can always be detected in the backend even with the patch. 
I think it's good enough to know that this patch cannot cause any real problem, 
so I think it's OK.


Thanks
    Guy

-----Original Message-----
From: [email protected] [mailto:[email protected]] 
On Behalf Of Tanya Lattner
Sent: Wednesday, July 25, 2012 01:03
To: llvm cfe
Cc: Dan Gohman
Subject: Re: [cfe-commits] [PATCH] Optimize vec3 loads/stores


On Jul 23, 2012, at 1:24 PM, Dan Gohman wrote:

> 
> On Jul 23, 2012, at 11:34 AM, Tanya Lattner <[email protected]> wrote:
> 
>> 
>> On Jul 19, 2012, at 11:51 AM, Dan Gohman wrote:
>> 
>>> 
>>> On Jul 18, 2012, at 6:51 PM, John McCall <[email protected]> wrote:
>>> 
>>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>>>>> Hi Tanya,
>>>>>> Looks good and usefull, but I'm not sure if it should be clang's 
>>>>>> decision if storing and loading vec4s is better than vec3.
>>>>> 
>>>>> The idea was to have Clang generate code that the optimizers would be 
>>>>> more likely to do something useful and smart with. I understand the 
>>>>> concern, but I'm not sure where the best place for this would be then?
>>>> 
>>>> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x blah> 
>>>> anyway;  arguably the backend already has all the information it needs for 
>>>> this.  Dan, what do you think?
>>> 
>>> I guess optimizer passes won't be extraordinarily happy about all 
>>> this bitcasting and shuffling. It seems to me that we have a problem 
>>> in that we're splitting up the high-level task of "lower <3 x blah> to <4 x 
>>> blah>"
>>> and doing some of it in the front-end and some of it in the backend.
>>> Ideally, we should do it all in one place, for conceptual 
>>> simplicity, and to avoid the awkwardness of having the optimizer run 
>>> in a world that's half one way and half the other, with awkward 
>>> little bridges between the two halves.
>> 
>> I think its hard to speculate that the optimizer passes are not happy about 
>> the bit cast and shuffling. I'm running with optimizations on and the code 
>> is still much better than not having Clang do this "optimization" for vec3.
> 
> Sorry for being unclear; I was speculating more about future 
> optimization passes. I don't doubt your patch achieves its purpose today.
> 
>> I strongly feel that Clang can make the decision to output code like this if 
>> it leads to better code in the end. 
> 
> Ok. What do you think about having clang doing all of the lowering of 
> <3 x blah> to <4 x blah> then? I mean all of the aritihmetic, function 
> arguments and return values, and so on? In other words, is there 
> something special about loads and stores of vec3, or are they just one 
> symptom of a broader vec3 problem?
> 

For function args and return values, the calling convention will coerce the 
types (on X86). I haven't had time to totally verify, but I think that 
arithmetic is done correctly in the backend via widening. So its mostly this 
one issue that we are trying to address. 

While it still may be a good idea of the backends to optimize situations such 
as this, I think its still ok for Clang to go ahead and effectively widen the 
vector when doing its code generation since it is a win for most targets 
(assuming as I can't test them all). vec3 is pretty important for the OpenCL 
community and we'd like it to have good performance. 

Does anyone have a firm objection to this going in? I realize that all backends 
could be modified to try to handle this, but I don't see this happening in the 
near future. 

-Tanya



> Of course, I'm not asking you do this work right now; I'm asking 
> whether this would be a better overall design.
> 
> Dan
> 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [PATCH] Optimize vec3 loads/stores

Reply via email to