Youwei Wang has posted comments on this change.

Change subject: IMPALA-2809: improve ByteSwap with builtin function or SSE or 
AVX2
......................................................................


Patch Set 3:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/3081/3/be/src/benchmarks/bswap-benchmark.cc
File be/src/benchmarks/bswap-benchmark.cc:

Line 35: //                         Impala               317.2                  
1X
> These names are pretty confusing. Aren't all of the benchmarked functions i
This benchmark is previously written by Zuo Wang. His idea is demonstrating all 
possible byteswap approaches for all possible data length. Here Impala refers 
to the old Impala implementation. Bswap refers to the improved scalar 
byte-swap. SSE/AVX2 refers to the byteswap in the SIMD approach.


Line 151: inline void ByteSwapAVX2(void* dest, const void* source, int len) {
> Agreed, we should benchmark the actual implementations in bit-util.h, inclu
I have done that. Please check the function below:
template<int num_bytes>
void TestSIMD(int batch_size, void* d) {

This function calls the BitUtil::Byteswap in the bit-util.h directly.

The bswap-benchmark binary generated by make_release.sh emits Segmentation 
fault error so I am not able to run it at this moment. Tim has explained this 
is due to the absence of Kudu in the runtime envrionment. I am trying to 
install Kudu and check if this issue can be solved.

PS: The binary generated by make_debug.sh can run. But I believe the benchmark 
result may be not persuasive engouh for you to accept.


Line 156:   if (num_bytes == 4) {
> This is a lot of code with a lot of near-repetition and no comments. It cou
Zuo Wang's idea is implementing a finer-grain level benchmark here. For 
example, since the data width of AVX2 is 256bit, we can still use 64/128bit of 
it to manipulate data partly. I can add more comments about this.


http://gerrit.cloudera.org:8080/#/c/3081/3/be/src/util/bit-util.inline.h
File be/src/util/bit-util.inline.h:

Line 140:   const __m128i mask = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12, 13, 14,
> Can you lift this out of the function into a global that gets initialized o
I am afraid I can't do this for the initializer _mm_set_epi8/_mm256_set_epi8 
can't compile without      __attribute__((target("sse4.2"/"avx2"))). And such 
attributes can only be applied to functions. But I think adding "static" 
modifier may help.


Line 164:   const __m256i mask = _mm256_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
10, 11, 12, 13, 14,
> Same comment here about lifting to a global.
Done


Line 168:   __int128_t part1 = *reinterpret_cast<__int128_t*>(dst);
> Than this be uint128_t?
Done


Line 174: inline void ByteSwapAVX2(void* dest, const void* source, int len) {
> You can combine this with ByteSwapSSE by making 32 and ByteSwapAVX2_Unit te
Done


Line 185:   ByteSwapScalar(dst, src, i);
> Is there a benefit to using ByteSwapSSE_Unit here if there is >= 16 left?
Done


Line 189:   if(len >= 32){
> Do we need the len check? What if we just call AVX/SSE always?
The AVX routine only suits for data length >= 256bit. And the SSE routine only 
suits for data length >= 128bit. So this is why we need the branch here. If AVX 
routine is applied to data length < 256bit, some unknown bits which are outside 
the boundary will be loaded. In this case, byte swapping will definitely 
results in wrong answers.


Line 191:     if ( CpuInfo::IsSupported(CpuInfo::AVX2) ){
> Space between ) and {
Done


Line 193:     } else if( CpuInfo::IsSupported(CpuInfo::SSE4_2) ){
> Fix spaces:
Done


Line 193:     } else if( CpuInfo::IsSupported(CpuInfo::SSE4_2) ){
> LIKELY(CpuInfo:: ....
Done


Line 198:   } else if( len >= 16){
> Fix spaces
Done


Line 200:     if( CpuInfo::IsSupported(CpuInfo::SSE4_2) ){
> Fix spaces
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Youwei Wang <[email protected]>
Gerrit-HasComments: Yes

Reply via email to