On Sun, 28 Nov 2010 06:44:23 -0500, bearophile <bearophileh...@lycos.com> wrote:
Robert Jacques:

I've spent some time having fun this afternoon optimizing array-equals
using vectorization techniques. I found that vectorizing using ulongs
worked best on my system except with the shortest strings, where a simple Duff's device edged it out. If you'd like to try it out on your data set:

Thank you for your work :-)
A version with your function, D version #8:

[snip]

Your function can't be inlined because it's big, so this code isn't faster than inlined code like this generated by the compiler: (codon.length == 3 && codon[0] == 'T' && codon[1] == 'A' && codon[2] == 'G')

Bye,
bearophile

Still, part of the point was that string comparisons in general were way too slow. Anyways, I've applied the same technique in a partially unrolled version if you want to check it out:

bool arrayComp(T, size_t N)(const T[] a, ref const T[N] b) pure nothrow {
    static if(T.sizeof*N <= uint.sizeof) {
return a.length == N && !( (*(cast(uint*)a.ptr) ^ *(cast(uint*)b.ptr)) & (uint.max >> 8*(uint.sizeof - T.sizeof*N) ));
    } else static if(T.sizeof*N <= ulong.sizeof) {
return a.length == N && !( (*(cast(ulong*)a.ptr) ^ *(cast(ulong*)b.ptr)) & (ulong.max>> 8*(ulong.sizeof - T.sizeof*N) ));
    } else { // Fall back to a loop
if(a.length != N || (*(cast(ulong*)a.ptr) != *(cast(ulong*)b.ptr)) ) return false; enum alignment = T.sizeof*N % ulong.sizeof > 0 ? T.sizeof*N % ulong.sizeof : ulong.sizeof;
        auto pa      = cast(ulong*)(cast(ubyte*)a.ptr + alignment);
        auto pb      = cast(ulong*)(cast(ubyte*)b.ptr + alignment);
        auto pa_end  = cast(ulong*)(cast(ubyte*)a.ptr + T.sizeof*N);
        while (pa < pa_end) {
            if(*pa++ != *pb++ ) return false;
        }
        return true;
    }
}

Be warned that you'll have to use strings explicitly typed as immutable char[N], since both array literals and string literals won't match a this template.

Reply via email to