Youwei Wang has posted comments on this change. Change subject: IMPALA-3604: Clean up SSE handling ......................................................................
Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/3227/2/be/src/util/sse-util.h File be/src/util/sse-util.h: > Have you tested this version of Impala (when compiled on an sse4.2 machine) Greetings, Jim. Q1: Have you tested this version of Impala (when compiled on an sse4.2 machine) running on non-sse4.2 machines? Answer: I believe according to the compiler principle, it doesn't matter too much what kind machine you are using to compile the codebase. The generated binary matters. As for non-sse4.2 CPUs, they are pretty rare now, especially in modern servers. I don't have available one to test, but I will try to find one. Q2: Have you tried compiling this on non-sse4.2 machines? Answer: I don't have available one at hand now, but I will try to find one. Q3: Have you looked at the generated code to check that the intrinsics aren't generating worse code? This has been known to happen. Answer: The tool I am using is this: http://gcc.godbolt.org/#. My approach is to compare the generated ASM code for two different versions of one same function. The compiler I choosed is GCC 4.9.2/GCC 5.1/GCC 6.1. Generally speaking, after revising the code, the intrinsic-style function will generate longer ASM codes than the old ones, but not too much. (3~4+ instructions so far as I can see) Thank you. Line 94: __asm__ volatile ("pcmpestrm %5, %2, %1" > The toolchain uses clang 3.8 now, so we can depend on that. Done Line 130: } > crc32 is it's own gcc target, I think. Greetings, Jim. According to your provided link, I have found something related to the crc32: -mcrc32: This option enables built-in functions __builtin_ia32_crc32qi, __builtin_ia32_crc32hi, __builtin_ia32_crc32si and __builtin_ia32_crc32di to generate the crc32 machine instruction. But it seems this is not related to the intrinsic _mm_crc32_u64 we used here. And if you don't mind, would you please take a look at this link? https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_crc32_u64&expand=1222 I just copy some content as following: unsigned __int64 _mm_crc32_u64 (unsigned __int64 crc, unsigned __int64 v) Synopsis unsigned __int64 _mm_crc32_u64 (unsigned __int64 crc, unsigned __int64 v) #include "nmmintrin.h" Instruction: crc32 r64, r64 CPUID Flags: SSE4.2 Description Starting with the initial value in crc, accumulates a CRC32 value for unsigned 64-bit integer v, and stores the result in dst. Operation tmp1[63:0] := v[0:63] // bit reflection tmp2[31:0] := crc[0:31] // bit reflection tmp3[95:0] := tmp1[31:0] << 32 tmp4[95:0] := tmp2[63:0] << 64 tmp5[95:0] := tmp3[95:0] XOR tmp4[95:0] tmp6[31:0] := tmp5[95:0] MOD2 0x11EDC6F41 dst[31:0] := tmp6[0:31] // bit reflection Performance Architecture Latency Throughput Haswell 3 - Ivy Bridge 3 - Sandy Bridge 3 - It seems its corresponding target is still -msse4.2. Line 132: __attribute__((target("sse4.2"))) > popcnt is its own target, I think. Done -- To view, visit http://gerrit.cloudera.org:8080/3227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id346187f3248922ec2a323548de4bdeaccd7a771 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Youwei Wang <[email protected]> Gerrit-HasComments: Yes
