http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46419
Summary: xmmintrin.h: _mm_cvtpu16_ps (and hence _mm_cvtpu8_ps)
returns false result in gcc >= 4.4
Product: gcc
Version: 4.4.5
Status: UNCONFIRMED
Severity: critical
Priority: P3
Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: release_candid...@yahoo.com
Created attachment 22367
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22367
example code
Dear GCC developers,
I guess the patch set
<http://gcc.gnu.org/viewcvs?view=revision&revision=134558> broke the
_mm_cvtpu16_ps() and _mm_cvtpu8_ps() intrinsics.
For demonstration, please refer to the attached example. It is intended to
convert four chars (1,2,3,4) into a SSE float vector type (__m128) by using the
Intel intrinsics _mm_cvtpu8_ps() and _mm_setr_pi8().
The output of the program compiled with gcc-4.3 is:
image: 1 2 3 4
out4: 1 2 3 4
This result is correct, and complies with Intel's intrinsic docs
<http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_mmx_set.htm>
//
<http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse_conversion.htm>,
as well as the output of icc compilation.
The output of gcc-4.4 and gcc-4.5 compilation is:
image: 1 2 3 4
out4: 3 4 1 2
I was able to trace this back the change set referred above. If I include the
old xmmintrin.h instead of the new header when using gcc-4.4, the result is
correct again. I didn't study the changes of rev. 134558 in detail, and I do
not know if the new algorithm is theoretically correct at all.
Could you please fix this bug?
I don't know about the other intrinsics touched by that patch.
Within this context, concerning the bug
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37496> might also be worth while.
Thanks,
Dirk