Youwei Wang has posted comments on this change.

Change subject: IMPALA-3604: Clean up SSE handling
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3227/2/be/src/util/sse-util.h
File be/src/util/sse-util.h:

> Have you tested this version of Impala (when compiled on an sse4.2 machine)
Greetings, Jim. 

Q1: Have you tested this version of Impala (when compiled on an sse4.2 machine) 
running on non-sse4.2 machines?

Answer: I believe according to the compiler principle, it doesn't matter too 
much what kind machine you are using to compile the codebase. The generated 
binary matters. As for non-sse4.2 CPUs, they are pretty rare now, especially in 
modern servers. I don't have available one to test, but I will try to find one.

Q2: Have you tried compiling this on non-sse4.2 machines?

Answer: I don't have available one at hand now, but I will try to find one.

Q3: Have you looked at the generated code to check that the intrinsics aren't 
generating worse code? This has been known to happen.

Answer: The tool I am using is this: http://gcc.godbolt.org/#. My approach is 
to compare the generated ASM code for two different versions of one same 
function. The compiler I choosed is GCC 4.9.2/GCC 5.1/GCC 6.1. Generally 
speaking, after revising the code, the intrinsic-style function will generate 
longer ASM codes than the old ones, but not too much. (3~4+ instructions so far 
as I can see) 

Thank you.


Line 94:   __asm__ volatile ("pcmpestrm %5, %2, %1"
> The toolchain uses clang 3.8 now, so we can depend on that.
Done


Line 130: }
> crc32 is it's own gcc target, I think.
Greetings, Jim.
According to your provided link, I have found something related to the crc32:
-mcrc32: 
This option enables built-in functions __builtin_ia32_crc32qi, 
__builtin_ia32_crc32hi, __builtin_ia32_crc32si and __builtin_ia32_crc32di to 
generate the crc32 machine instruction. 

But it seems this is not related to the intrinsic _mm_crc32_u64 we used here. 
And if you don't mind, would you please take a look at this link?
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_crc32_u64&expand=1222

I just copy some content as following:
unsigned __int64 _mm_crc32_u64 (unsigned __int64 crc, unsigned __int64 v)
Synopsis
unsigned __int64 _mm_crc32_u64 (unsigned __int64 crc, unsigned __int64 v)
#include "nmmintrin.h"
Instruction: crc32 r64, r64
CPUID Flags: SSE4.2
Description
Starting with the initial value in crc, accumulates a CRC32 value for unsigned 
64-bit integer v, and stores the result in dst.
Operation
tmp1[63:0] := v[0:63] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[95:0] := tmp1[31:0] << 32
tmp4[95:0] := tmp2[63:0] << 64
tmp5[95:0] := tmp3[95:0] XOR tmp4[95:0]
tmp6[31:0] := tmp5[95:0] MOD2 0x11EDC6F41
dst[31:0] := tmp6[0:31] // bit reflection

Performance
Architecture  Latency Throughput
Haswell 3 -
Ivy Bridge  3 -
Sandy Bridge  3 -

It seems its corresponding target is still -msse4.2.


Line 132: __attribute__((target("sse4.2")))
> popcnt is its own target, I think.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id346187f3248922ec2a323548de4bdeaccd7a771
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Youwei Wang <[email protected]>
Gerrit-HasComments: Yes

Reply via email to