Benchmark memchar (with GCC builtins)

Iakh via Digitalmars-d Fri, 30 Oct 2015 14:31:14 -0700

I continue to play with SIMD. So I was trying to use std.simd

But it has lots of thing to be implemented. And I also gave upwithcore.simd.__simd due to problems with PMOVMSKB instruction (itis not implemented).


Today I was playing with memchr for gdc:
memchr: http://www.cplusplus.com/reference/cstring/memchr/
My implementations with benchmark:
http://dpaste.dzfl.pl/4c46c0cf340c

Benchmark results:
-----
Naive:        21.9      TickDuration(136456491)
SIMD:         3.04      TickDuration(18920182)
SIMDM:        2.44      TickDuration(15232176)
SIMDU:         1.8      TickDuration(11210454)
C:               1      TickDuration(6233963)

Mid colon is duration relative to C implementation(core.stdc.string).

memchrSIMD splits an input into three parts: unaligned begin,unaligned end, and aligned mid.


memchrSIMDM instead of pmovmskb use this code:
------
        if (Mask mask = *cast(Mask*)(result.array.ptr))
        {
            return ptr + bsf(mask) / BitsInByte;
        }

else if (Mask mask = *cast(Mask*)(result.array.ptr +Mask.sizeof))

return ptr + bsf(mask) / BitsInByte +cast(int)Mask.sizeof;

        }
------

memchrSIMDU (unaligned) applay SIMD instructions from first arrayelements


SIMD part of function:
------
        ubyte16 niddles;
        niddles.ptr[0..16] = value;
        ubyte16 result;
        ubyte16 arr;

        for (; ptr < alignedEnd; ptr += 16)
        {
            arr.ptr[0..16] = ptr[0..16];
            result = __builtin_ia32_pcmpeqb128(arr, niddles);
            int i = __builtin_ia32_pmovmskb128(result);
            if (i != 0)
            {
                return ptr + bsf(i);
            }
        }
------

Benchmark memchar (with GCC builtins)

Reply via email to