Youwei Wang has posted comments on this change.

Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSE or 
AVX2.
......................................................................


Patch Set 17:

(8 comments)

Hi Jim. Thank you so much for providing so such precious ideas!

http://gerrit.cloudera.org:8080/#/c/3081/17/be/src/util/bit-util.inline.h
File be/src/util/bit-util.inline.h:

Line 25: inline void ByteSwapScalarLoop(void* dst, const void* src, int len) {
> Could you add a TODO here that this can be made more efficient using BSWAP?
Good suggestion. DONE.


Line 136: #ifndef IR_COMPILE
> I'm not sure we need this. What makes you believe we do need them?
Hi Jim. Would you please tell which clang version you are using to build the 
product-ready Impala? If the clang version < 3.8, the following code can't be 
recongized during the LLVM IR generaton. Clang 3.8 is okay according to 
https://godbolt.org/g/YTSXK5. So if you suggest so, I can upgrade Clang/LLVM to 
the latest version immediately. And such IR_COMPILE can be removed in this case.


Line 144: __attribute__((target("sse4.2")))
> I don't think this uses anything beyond SSSE3.
Done


Line 156:   const __m256i mask256i = _mm256_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11,
> I think you can make this constexpr and static without the run-time penalty
Done


Line 163:   *reinterpret_cast<uint128_t*>(dst + 16) = part1;
> I think you should vzeroupper at the end; see "Avoiding AVX-SSE Transition 
Hi Jim. Thank you so much for pointing that out! This is some very precious 
experience for me.


Line 193:   // Remaining bytes(<16) are dealt with scalar routine
> They might be able to be done faster with pshufb. Add TODO, please
Done


Line 200:   if ( len >= 32 ) {
> In the current code, len is always <= 16, right? As such, ByteSwapAVX2_Unit
Hi Jim. I have conducted some micro-benchmark and the result shows we can 
achieve over 3.32x performance boost if we replace the std::reverse_copy in 
StringFunctions::Reverse with BitUtil::ByteSwap on my AVX2-enabled workstation. 
I will conduct more benchmarks to justify the existance of this branch "if ( 
len >= 32 )".


Line 217:   ByteSwapScalar(dest, source, len);
> This should be denoted the most LIKELY branch.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1
Gerrit-PatchSet: 17
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Youwei Wang <[email protected]>
Gerrit-HasComments: Yes

Reply via email to