Youwei Wang has posted comments on this change. Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSE or AVX2. ......................................................................
Patch Set 11: (6 comments) http://gerrit.cloudera.org:8080/#/c/3081/11/be/src/benchmarks/bswap-benchmark.cc File be/src/benchmarks/bswap-benchmark.cc: Line 49: // SIMD 57.73 2.18X > Why is SIMD so much slower? Since SIMD is the direct caller of the new BitUtil::ByteSwap, which has a arch-selector branch for arch compatible, like: if (CpuInfo::IsSupported(CpuInfo::AVX2)) { ByteSwapSIMD<32, ByteSwapAVX2_Unit>(dest, source, len); } else if (LIKELY(CpuInfo::IsSupported(CpuInfo::SSE4_2))) { ByteSwapSIMD<16, ByteSwapSSE_Unit>(dest, source, len); } else { ByteSwapScalar(dest, source, len); } I believe the branch is the reason which causes this. Line 222: MACRO_TEST_FBS_CASE(8, 5) > This code can be further reduced by making 8,5 to 8,8 a loop. Hi Jim. I am afraid this is not such case. The code above essentially is a branch not a loop. These MACROs is in a switch statement. If we wrap them in a loop. It will be something like: switch(data->FIXED_LEN_SIZE) for(...) { case STEP: for (int j = 0; j < data->num_values; ++j) { impala::ByteSwapScalar(buffer, &data->d##DX##_values[j], STEP); } return; } This is not some acceptable C/C++ code. Line 260: /// FIXED_LEN_SIZE = 4: Decimal4Value, size of array element is 8x4 = 32bit > Which values are of interest? Is 3? Fixed in the new version Line 263: template <int NUM_BYTES, int FIXED_LEN_SIZE> > Please explain this in the comment above. Fixed in the new version Line 273: // The bit range is 1x8=8 ~ 4x8=32; > The bit range of what? What are we doing with that bit range? where does "1 Fixed in the new version Line 537: MACRO_TEST_AVX2_CASE(16, 9) > Again, this can be reduced in size with a loop over the second param. Hi Jim. I am afraid this is not such case. The code above essentially is a branch not a loop. These MACROs is in a switch statement. If we wrap them in a loop. It will be something like: switch(data->FIXED_LEN_SIZE) for(...) { case STEP: for (int j = 0; j < data->num_values; ++j) { impala::ByteSwapScalar(buffer, &data->d##DX##_values[j], STEP); } return; } This is not some acceptable C/C++ code. -- To view, visit http://gerrit.cloudera.org:8080/3081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 Gerrit-PatchSet: 11 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Youwei Wang <[email protected]> Gerrit-HasComments: Yes
