show_bug.cgi?id=11266
Bug #: 11266
Summary: Inefficient x86 vector code generation for add v16i8;
Product: libraries
Version: trunk
Platform: PC
OS/Version: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
AssignedTo: [email protected]
ReportedBy: [email protected]
CC: [email protected]
Classification: Unclassified
>From the email exchange between Andrew and Chris:
Consider the following function which doubles a <16 x i8> vector:
>
> define <16 x i8> @test(<16 x i8> %a) {
> %b = add <16 x i8> %a, %a
> ret <16 x i8> %b
> }
>
> If I compile it for x86 with llc like so:
>
> llc paddb.ll -filetype=asm -o=/dev/stdout
>
> I get a two-op function that just does paddb %xmm0 %xmm0 and then
> returns. llc does this regardless of the optimization level. Great!
>
> If I let the instcombine pass touch it like so:
>
> opt -instcombine paddb.ll | llc -filetype=asm -o=/dev/stdout
>
> or like so:
>
> opt -O3 paddb.ll | llc -filetype=asm -o=/dev/stdout
>
> then the add gets converted to a vector left shift by 1, which then
> lowers to a much slower function with about a hundred ops. No amount
> of optimization after the fact will simplify it back to paddy.
This sounds like a really serious X86 backend performance bug. Canonicalizing
"x+x" to a shift is the "right thing to do", the backend should match it.
--
Configure bugmail: userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
LLVMbugs mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs