http://llvm.org/bugs/show_bug.cgi?id=20006
Bug ID: 20006
Summary: uchar4 vector element-by-element LUT handled worse
than 3.4.
Product: clang
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: -New Bugs
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Classification: Unclassified
Created attachment 12640
--> http://llvm.org/bugs/attachment.cgi?id=12640&action=edit
Function demonstrating regression.
Given a simple LUT loop which operates independently on each element of a
uchar4, trunk presses ahead with a bunch of inserts and extracts through a
temporary vector, along with a flurry of type conversion operations.
In contrast, Clang 3.4 appears to abandon the vector pretense at the outset and
takes the scalar values directly from the source pointer -- completing the work
in half the time.
GCC 4.8 appears to behave like Clang 3.4, but additionally packs the output
into a single 32-bit scalar register before writing.
This affects both amd64 and ARM. Simply -Ofast to reproduce on amd64, and
additionally -mfpu=neon for ARM to ensure that SIMD operations are available.
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
LLVMbugs mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs