Youwei Wang has posted comments on this change. Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSE or AVX2. ......................................................................
Patch Set 17: (8 comments) Hi Jim. Thank you so much for providing so such precious ideas! http://gerrit.cloudera.org:8080/#/c/3081/17/be/src/util/bit-util.inline.h File be/src/util/bit-util.inline.h: Line 25: inline void ByteSwapScalarLoop(void* dst, const void* src, int len) { > Could you add a TODO here that this can be made more efficient using BSWAP? Good suggestion. DONE. Line 136: #ifndef IR_COMPILE > I'm not sure we need this. What makes you believe we do need them? Hi Jim. Would you please tell which clang version you are using to build the product-ready Impala? If the clang version < 3.8, the following code can't be recongized during the LLVM IR generaton. Clang 3.8 is okay according to https://godbolt.org/g/YTSXK5. So if you suggest so, I can upgrade Clang/LLVM to the latest version immediately. And such IR_COMPILE can be removed in this case. Line 144: __attribute__((target("sse4.2"))) > I don't think this uses anything beyond SSSE3. Done Line 156: const __m256i mask256i = _mm256_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, > I think you can make this constexpr and static without the run-time penalty Done Line 163: *reinterpret_cast<uint128_t*>(dst + 16) = part1; > I think you should vzeroupper at the end; see "Avoiding AVX-SSE Transition Hi Jim. Thank you so much for pointing that out! This is some very precious experience for me. Line 193: // Remaining bytes(<16) are dealt with scalar routine > They might be able to be done faster with pshufb. Add TODO, please Done Line 200: if ( len >= 32 ) { > In the current code, len is always <= 16, right? As such, ByteSwapAVX2_Unit Hi Jim. I have conducted some micro-benchmark and the result shows we can achieve over 3.32x performance boost if we replace the std::reverse_copy in StringFunctions::Reverse with BitUtil::ByteSwap on my AVX2-enabled workstation. I will conduct more benchmarks to justify the existance of this branch "if ( len >= 32 )". Line 217: ByteSwapScalar(dest, source, len); > This should be denoted the most LIKELY branch. Done -- To view, visit http://gerrit.cloudera.org:8080/3081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 Gerrit-PatchSet: 17 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Youwei Wang <[email protected]> Gerrit-HasComments: Yes
