Youwei Wang has posted comments on this change. Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2. ......................................................................
Patch Set 40: (1 comment) http://gerrit.cloudera.org:8080/#/c/3081/40/be/src/util/bit-util.cc File be/src/util/bit-util.cc: Line 170: const uint8_t* src = reinterpret_cast<const uint8_t*>(source); > 1. I find this doc inscrutable without more labeling. Are the four differen Hi Jim. 1. I am sorry for my first coarse table there for I am a little lost due to weird push issue mentioned in the mailinglist. Please break down the table into two parts when you read this table: one part is for the benchmark result of using template parameter without branch, which is colored in blue. The other part is for the benchmark result of not using template parameter but with branch, which is colored in red. Each part includes five runs. Each run will yield three performance data for FastScalar, SSSE3, AVX2 and SIMD. So for each run, we can get one single average performance data for FastScalar, SSSE3, AVX2 and SIMD respectively. And for all these five runs, we can get the FINAL average performance data for FastScalar, SSSE3, AVX2 and SIMD respectively. After these two parts are done, I just copy the final average performance data for each part and exhibit them side by side to make a easier comparsion. So I believe we can take a quick conclusion by going through the final table. I have colored some table columns to make it easier to read. If you are interested, would you please revisit the sheet link? And please feel free to tell me if you still feel confused about this table. Thank you. 2. I have used the objdump tool to check the assembly code from the libUtil.a binary. I have copid the aasembly code of different implementations of the template function (with and without the function pointer in the template paramenter list) to an online document link as following: https://docs.google.com/document/d/1bCCjKPg7ytpbRTeC6UrnxoSDHCp0IOAVsQOdcQTrM9M/edit?usp=sharing As you can see here, two different codebases have generated the same libUtil.a binary. (They have the same md5sum value.) Based on this fact, I guess the compiler optimization has taken care of this issue. Thank you for sharing any of your ideas. :) -- To view, visit http://gerrit.cloudera.org:8080/3081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 Gerrit-PatchSet: 40 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Youwei Wang <[email protected]> Gerrit-HasComments: Yes
