On Jul 18, 2012, at 6:51 PM, John McCall wrote:
> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>> Hi Tanya,
>>> Looks good and usefull, but I'm not sure if it should be clang's decision
>>> if storing and loading vec4s is better than vec3.
>>
>> The idea was to have Clang generate code that the optimizers would be more
>> likely to do something useful and smart with. I understand the concern, but
>> I'm not sure where the best place for this would be then?
>
> Hmm. The IR size of a <3 x blah> is basically the size of a <4 x blah>
> anyway; arguably the backend already has all the information it needs for
> this. Dan, what do you think?
>
> One objection to doing this in the frontend is that it's not clear to me that
> this is a transformation we should be doing if <4 x blah> isn't actually
> legal for the target. But I'm amenable to the idea that this belongs here.
>
I do not think its Clangs job to care about this as we already have this
problem for other vector sizes and its target lowering's job to fix it.
> I'm also a little uncomfortable with this patch because it's so special-cased
> to 3. I understand that that might be all that OpenCL really cares about,
> but it seems silly to add this code that doesn't also kick in for, say, <7 x
> i16> or whatever. It really shouldn't be difficult to generalize.
While it could be generalized, I am only 100% confident in the codegen for vec3
as I know for sure that it improves the code quality that is ultimately
generated. This is also throughly tested by our OpenCL compiler so I am
confident we are not breaking anything and we are improving performance. From
my perspective, there is no vec7 in OpenCL, so there is no strong need for it
to be generalized. It seems to make sense that it would also improve code
quality for vec7. So while I could make the change, I don't have real world
apps to test it. I think this can easily be left as an extension for later.
>
> Also, this optimization assumes that sizeof(element) is a power of 2, because
> the only thing that the AST guarantees is that the size of the vector type is
> a power of 2, and nextPow2(3 x E) == nextPow2(4 x E) is only true in general
> if E is a power of 2. Is that reasonable? Is that checked somewhere?
>
I don't think this is officially checked anywhere but is a general rule.
-Tanya
> Two minor notes.
>
> + const llvm::VectorType *VTy =
> + dyn_cast<llvm::VectorType>(EltTy);
>
> How can this fail? If it can, that's interesting; if it can't, it would be
> better if the "is this a vector operation" checks used the IR type instead of
> the AST type.
>
> + llvm::PointerType *DstPtr = cast<llvm::PointerType>(Addr->getType());
> + if (DstPtr->getElementType() != SrcTy) {
>
> This can't fail in the case we care about, so you can just merge
> this into the if-block above.
>
> John.
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits