On 10/12/2016 09:35 PM, Stefan Koch wrote:
On Thursday, 13 October 2016 at 01:27:35 UTC, Andrei Alexandrescu wrote:
On 10/12/2016 08:41 PM, safety0ff wrote:
On Thursday, 13 October 2016 at 00:32:36 UTC, safety0ff wrote:
It made little difference: LDC compiled into AVX2 vectorized addition
(vpmovzxbq & vpaddq.)
Measurements without -mcpu=native:
without branch hints 0.852s
code pasted 0.766s
So we should be able to reduce overhead by means of proper code
arrangement and interplay of inlining and outlining. The prize,
however, would be to get the AVX instructions for ASCII going. Is that
possible? -- Andrei
AVX for ascii ?
What are you referring to ?
Most text processing is terribly incompatible with simd.
sse 4.2 has a few instructions that do help, but as far as I am aware it
is not yet too far spread.
Oh ok, so it's that checksum in particular that got optimized. Bad
benchmark! Bad! -- Andrei