On Sunday, 29 May 2016 at 20:40:52 UTC, qznc wrote:
On Sunday, 29 May 2016 at 18:15:16 UTC, qznc wrote:
On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
And if you're not simply comparing for equality, what are you
looking to figure out? Without more information about what
you're trying to do, it's kind of hard to help you.
If I write the comparison naively, the assembly clearly shows
a "movzbl" [0]. It loads a single byte! The other single byte
load is encoded in the address mode of "cmp". Implementation:
bool stringcmp(string x, string y) {
foreach(i; 0..x.length) {
if (x[i] != y[i]) // byte compare
return false;
}
return true;
}
It makes no sense to load single bytes here. Since we only
want to check for equality, we could load two full words and
compare four or eight bytes in one go.
Ok, to answer my own question, this looks good:
bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y)
{
pragma(inline, false);
if (x.length != y.length) return false;
int i=0;
// word-wise compare is faster than byte-wise
if (x.length > size_t.sizeof)
for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
size_t* xw = cast(size_t*) &x[i];
size_t* yw = cast(size_t*) &x[i];
if (*xw != *yw) return false;
}
// last sub-word part
for (; i < x.length; i+=1) {
if (x[i] != y[i]) // byte compare
return false;
}
return true;
}
Any comments or recommendations?
I don't know if this would be faster, but here is my attempt.
It assumes the arrays start at an address multiple of 8.
if (x is y)
return true;
if (x.length != y.length)
return false;
size_t l = x.length;
ubyte* a = x.ptr, b = y.ptr;
for (size_t n = l>>3; n != 0; --n, a+=8, b+=8)
if (*cast(long*)a ^ *cast(long*)b)
return false;
if (l & 4)
{
if (*cast(int*)a ^ *cast(int*)b)
return false;
a+= 4;
b+= 4;
}
if (l & 2)
{
if (*cast(short*)a ^ *cast(short*)b)
return false;
a+=2;
b+=2;
}
return (l & 1) && (*a ^ *b) ? false : true;
If the pointers are not on an address multiple of 8, one has to
inverse the trailing tests to consume the bytes in front of the
array until the address becomes a multiple of 8.
The trailing tests could eventually be replaced by a simple
sequential byte compare. I don't know which is faster.