http://llvm.org/bugs/show_bug.cgi?id=20006

            Bug ID: 20006
           Summary: uchar4 vector element-by-element LUT handled worse
                    than 3.4.
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: -New Bugs
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified

Created attachment 12640
  --> http://llvm.org/bugs/attachment.cgi?id=12640&action=edit
Function demonstrating regression.

Given a simple LUT loop which operates independently on each element of a
uchar4, trunk presses ahead with a bunch of inserts and extracts through a
temporary vector, along with a flurry of type conversion operations.

In contrast, Clang 3.4 appears to abandon the vector pretense at the outset and
takes the scalar values directly from the source pointer -- completing the work
in half the time.

GCC 4.8 appears to behave like Clang 3.4, but additionally packs the output
into a single 32-bit scalar register before writing.


This affects both amd64 and ARM.  Simply -Ofast to reproduce on amd64, and
additionally -mfpu=neon for ARM to ensure that SIMD operations are available.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
LLVMbugs mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs

Reply via email to