I continue to play with SIMD. So I was trying to use std.simd
But it has lots of thing to be implemented. And I also gave up with core.simd.__simd due to problems with PMOVMSKB instruction (it is not implemented).

Today I was playing with memchr for gdc:
memchr: http://www.cplusplus.com/reference/cstring/memchr/
My implementations with benchmark:
http://dpaste.dzfl.pl/4c46c0cf340c

Benchmark results:
-----
Naive:        21.9      TickDuration(136456491)
SIMD:         3.04      TickDuration(18920182)
SIMDM:        2.44      TickDuration(15232176)
SIMDU:         1.8      TickDuration(11210454)
C:               1      TickDuration(6233963)

Mid colon is duration relative to C implementation (core.stdc.string).

memchrSIMD splits an input into three parts: unaligned begin, unaligned end, and aligned mid.

memchrSIMDM instead of pmovmskb use this code:
------
        if (Mask mask = *cast(Mask*)(result.array.ptr))
        {
            return ptr + bsf(mask) / BitsInByte;
        }
else if (Mask mask = *cast(Mask*)(result.array.ptr + Mask.sizeof))
        {
return ptr + bsf(mask) / BitsInByte + cast(int)Mask.sizeof;
        }
------

memchrSIMDU (unaligned) applay SIMD instructions from first array elements

SIMD part of function:
------
        ubyte16 niddles;
        niddles.ptr[0..16] = value;
        ubyte16 result;
        ubyte16 arr;

        for (; ptr < alignedEnd; ptr += 16)
        {
            arr.ptr[0..16] = ptr[0..16];
            result = __builtin_ia32_pcmpeqb128(arr, niddles);
            int i = __builtin_ia32_pmovmskb128(result);
            if (i != 0)
            {
                return ptr + bsf(i);
            }
        }
------

Reply via email to