Youwei Wang has posted comments on this change.

Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSE or 
AVX2.
......................................................................


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3081/15/be/src/benchmarks/bswap-benchmark.cc
File be/src/benchmarks/bswap-benchmark.cc:

> Please also add tests to bit-util-test.cc
Hi Jim. Our current modification to the interface BitUtil::ByteSwap only 
happens inside its body. So there has already existed some test cases in 
TEST(BitUtil, ByteSwap). So do you mean adding extra test cases in this 
function TEST(BitUtil, ByteSwap)? Or maybe you mean adding some test cases for 
the functions like ByteSwapSSE_Unit/ByteSwapAVX2_Unit?


Line 124: void TestSSE42Swap(int batch_size, void* d) {
> When I asked earlier if we care about FIXED_LEN_SIZE, I wasn't trying to be
Hi Jim. Thank you so much for providing this precise information here.
After some investigation, I think my previous understanding the functionality 
of the API 
“inline void BitUtil::ByteSwap(void* dest, const void* source, int len)“ is 
quite different as it actually turns out to be. 

As I can tell from the code, BitUtil::ByteSwap seems to serve the callers mostly
DecimalUtil::DecodeFromFixedLenByteArray(buffer, fixed_len_size, v);
DecimalUtil::EncodeFromFixedLenByteArray(buffer, fixed_len_size, v);

Obviously, in this case, the input parament “len” of this function will be 
4bytes corresponding to Decimal4Value, 8bytes corresponding to Decimal8Value 
and 16bytes corresponding to Decimal16Value. Actually, my current code  
 “inline void BitUtil::ByteSwap(void* dest, const void* source, int len)“ will 
not utilize the SIMD feature to deal with the input array if the len < 16. So 
we can expect no performance improvement in this case.

Actually, concluded from previous patch, the code has a priori knowledge about 
what it is going to deal with. 
So this is the very meaning of this function in the benchmark:
template <int num_bytes, int fixed_len_size> 
inline void ByteSwapAVX2(void* dest, const void* source, int len) 

int num_bytes: tell which kind of Decimal values to deal with. That is, 4bytes 
corresponding to Decimal4Value, 8bytes corresponding to Decimal8Value and 
16bytes corresponding to Decimal16Value.
int fixed_len_size: the internal bytes of corresponding Decimal values to swap; 
just like you have mentioned previously:
After some digging, it looks to me like we sometimes do care about unusual 
FIXED_LEN_SIZE. See parquet-common.h, which includes the comment:
    /// fixed_len_size can be less than sizeof(Decimal*Value) for space 
savings. This means                                                      
    /// that the value in the in-memory format has leading zeros or negative 
1's.                                                                
    /// For example, precision 2 fits in 1 byte. All decimals stored as 
Decimal4Value                                                            
    /// will have 3 bytes of leading zeros, we will only store the interesting 
byte.
Dest: the pointer to the destination memory address;
Source: the pointer to the source memory address;
Len: the number of Decimal values; 

If my understanding is true, I afraid the AVX2 optimization can not be used 
inside the BitUtil::ByteSwap since it will only deal with at most 128Bit data. 
And SSE4.2 optimization will only be used to deal with the Decimal16Value. Our 
previous patch shows the high efficiency of using SIMD approach when dealing 
with a long DecimalXValue array.  So my idea now is to find the callers of 
EncodeToFixedLenByteArray and DecodeToFixedLenByteArray and use our previous 
patch to substitute these two functions if they are used to encode/decode a 
very long DecimalXValue array. And I believe this is the essential idea of 
previous patch. 

I am looking forward to hearing your idea. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/3081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1
Gerrit-PatchSet: 15
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Youwei Wang <[email protected]>
Gerrit-HasComments: Yes

Reply via email to