[Impala-CR](cdh5-trunk) IMPALA-2809: Improve ByteSwap with builtin function or SSE or AVX2.

Youwei Wang (Code Review) Tue, 24 May 2016 19:54:52 -0700

Youwei Wang has posted comments on this change.

Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSE or 
AVX2.
......................................................................



Patch Set 11:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/3081/11/be/src/benchmarks/bswap-benchmark.cc
File be/src/benchmarks/bswap-benchmark.cc:

Line 49: //                           SIMD               57.73               
2.18X
> Why is SIMD so much slower?
Since SIMD is the direct caller of the new BitUtil::ByteSwap, which has a 
arch-selector branch for arch compatible, like:
 if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
    ByteSwapSIMD<32, ByteSwapAVX2_Unit>(dest, source, len);
  } else if (LIKELY(CpuInfo::IsSupported(CpuInfo::SSE4_2))) {
    ByteSwapSIMD<16, ByteSwapSSE_Unit>(dest, source, len);
  } else {
    ByteSwapScalar(dest, source, len);
  }
I believe the branch is the reason which causes this.


Line 222:         MACRO_TEST_FBS_CASE(8, 5)
> This code can be further reduced by making 8,5 to 8,8 a loop.
Hi Jim. I am afraid this is not such case. The code above essentially is a 
branch not a loop. These MACROs is in a switch statement. If we wrap them in a 
loop. It will be something like:
switch(data->FIXED_LEN_SIZE)
for(...) {
 case STEP:
    for (int j = 0; j < data->num_values; ++j) {
      impala::ByteSwapScalar(buffer, &data->d##DX##_values[j], STEP);
    }
    return;
}
This is not some acceptable C/C++ code.


Line 260: /// FIXED_LEN_SIZE = 4: Decimal4Value, size of array element is 8x4 = 
32bit
> Which values are of interest? Is 3?
Fixed in the new version


Line 263: template <int NUM_BYTES, int FIXED_LEN_SIZE>
> Please explain this in the comment above.
Fixed in the new version


Line 273:     // The bit range is 1x8=8 ~ 4x8=32;
> The bit range of what? What are we doing with that bit range? where does "1
Fixed in the new version


Line 537:         MACRO_TEST_AVX2_CASE(16, 9)
> Again, this can be reduced in size with a loop over the second param.
Hi Jim. I am afraid this is not such case. The code above essentially is a 
branch not a loop. These MACROs is in a switch statement. If we wrap them in a 
loop. It will be something like:
switch(data->FIXED_LEN_SIZE)
for(...) {
 case STEP:
    for (int j = 0; j < data->num_values; ++j) {
      impala::ByteSwapScalar(buffer, &data->d##DX##_values[j], STEP);
    }
    return;
}
This is not some acceptable C/C++ code.


-- 
To view, visit http://gerrit.cloudera.org:8080/3081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Youwei Wang <[email protected]>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2809: Improve ByteSwap with builtin function or SSE or AVX2.

Reply via email to