Jim Apple has posted comments on this change. Change subject: Use AVX2 operations to speedup Bloom filters by 10-100%. ......................................................................
Patch Set 5: > It just occurred to me that this could give incorrect results > running on a mixed cluster of avx2/non-avx2 machines. > > Would it make sense to just use the avx2-optimised layout for the > non-avx2 case? Good point, Tim! Using the same layout is certainly possible. Using the same hash functions, however, would slow down the non-avx2 code. The reason is that, between PS4 and PS5, I stated using the vpmulld instruction to rehash the 32-bit value by multiplying it by 8 different odd 32-bit constants and taking the top 5 bits of each. In the serial code, I multiply by two different 64-bit constants using Rehash32to64, add other 64-bit constants, then take the top 32-bits of each of each. Switching to eight 32-bit multiplications would be a good bit slower, I suspect. This could be alleviated using pmulld, which can perform 4 32-bit multiplications with one instruction, but that was added in SSE4.1. I see two options: 1. Leave some performance on the table with this commit by moving back to PS4. 2. Take a regression for pre-sse4.1 machines (ended in 2008ish for Intel, 2012ish for AMD, if I'm reading correctly) and a bigger speedup for more modern machines. I have another change I've already started testing that increases the gap between #1 and #2 by another 50-100%. Tim, Dan: what do you think is the right choice? -- To view, visit http://gerrit.cloudera.org:8080/3338 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6fef4f6652876f8fd7e3f0e41431702380418c98 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: No
