Youwei Wang has posted comments on this change. Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSE or AVX2. ......................................................................
Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/3081/3/be/src/util/bit-util.inline.h File be/src/util/bit-util.inline.h: Line 140: const __m128i mask = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, > Hi Jim. Thank you for providing this pseudocode. Actually, the macro " > #ifn Hi Jim. I have conducted some simple tests. In order to describe it simply, I define several test items here: 1. ScalarFunc: call the function ByteSwapScalar(void* dest, const void* source, int len); 2. SSE4.2 OUTSIDE PERF: put the variable "const __m128i mask" outside the function; 3. SSE4.2[INSIDE-STATIC]: put the variable "const __m128i mask" inside the function WITH static modifier; 4. SSE4.2[INSIDE-NOT-STATIC]: put the variable "const __m128i mask" inside the function WITHOUT static modifier; 5. AVX2[INSIDE-STATIC]: put the variable "const __m256i mask" inside the function WITH static modifier; 6. AVX2[INSIDE-NOT-STATIC]: put the variable "const __m256i mask" inside the function WITHOUT static modifier; Note: GCC has not good support for AVX2 enough, so putting the variable "const __m256i mask" outside the function can't compile. Test approach: 1. Prepare an uint8_t array of 10000000 elements, whose values are randomly generated; 2. Use those 6 approaches to swap this array for 1000 times and measure the consumed time; 3. SSE4.2 call: ByteSwapSIMD<16>; 4. AVX2 call: ByteSwapSIMD<32>; CPU info: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz So the performance result is: SCALAR PERF: 1x SSE4.2[OUTSIDE PERF]: 3.00x SSE4.2[INSIDE-STATIC] PERF: 2.75x SSE4.2[INSIDE-NOT-STATIC] PERF: 2.89x AVX2[INSIDE-STATIC] PERF: 2.90x AVX2[INSIDE-NOT-STATIC] PERF: 3.27x Conclusion: so for SSE4.2, we should put the const __m128i mask initializer code outside. For AVX2, we should not use the static modifier. -- To view, visit http://gerrit.cloudera.org:8080/3081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Youwei Wang <[email protected]> Gerrit-HasComments: Yes
