On Thursday, 7 June 2012 at 22:29:09 UTC, Andrei Alexandrescu wrote:
Great points, example could be a lot better. Maybe it's time you do get started on find(). Destroy or be destroyed.

Ok...

This overload:

R find(alias pred = "a == b", R, E)(R haystack, E needle)
if (isInputRange!R &&
        is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
{
    for (; !haystack.empty; haystack.popFront())
    {
        if (binaryFun!pred(haystack.front, needle)) break;
    }
    return haystack;
}


Is fine. In fact it's practically perfect. It's the obvious solution to a simple problem. The biggest problem is the "is typeof" syntax, but that's not your fault.

Second overload:

R1 find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle)
if (isForwardRange!R1 && isForwardRange!R2
&& is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool)
        && !isRandomAccessRange!R1)
{
static if (is(typeof(pred == "a == b")) && pred == "a == b" && isSomeString!R1 && isSomeString!R2
            && haystack[0].sizeof == needle[0].sizeof)
    {
//return cast(R1) find(representation(haystack), representation(needle));
        // Specialization for simple string search
        alias Select!(haystack[0].sizeof == 1, ubyte[],
Select!(haystack[0].sizeof == 2, ushort[], uint[]))
            Representation;
        // Will use the array specialization
return cast(R1) .find!(pred, Representation, Representation) (cast(Representation) haystack, cast(Representation) needle);
    }
    else
    {
        return simpleMindedFind!pred(haystack, needle);
    }
}

As far as I can tell, this is a proxy for various substring search implementations.

Problem 1: Why is find overloaded to do substring search? Why does it do substring and not subset or subsequence? substring is a completely different algorithm from linear search and even has different asymptotic running time. This is needlessly overloaded, it adds nothing.

Problem 2: The attempted specialisation is atrocious. It compares the predicate string with "a == b". I just did a quick check, and this means that these two calls DO NOT use the same algorithm:

string a = ..., b = ...;
auto fast = find!"a == b"(a, b);
auto slow = find!"a==b"(a, b);

I have to be honest, I have only just noticed this, but that's really sloppy.

It's also a direct symptom of over-complicated code. As a user, I would 100% expect these calls to be the same. If I accidentally used the second version, I would have no warning: my code would just run slower and I'd be none the wiser. Only upon careful inspection of the source could you discover what this "simple" code does, and you would be horrified like I am now.

The next two overloads of find appears to implement a couple of reasonably fast substring. Again, it would take me a minute or so to figure out exactly what algorithm was being used.

After that there's a multi-range find. Seems simple enough, but it's yet another overload to consider and I'm not sure it's commonly used enough to even warrant existence. I'd hate to think what would happen if I wanted the single-range search but accidentally added an extra parameter.

In summary: the specialisation detection is shocking, which leads me to question what other static ifs are accidentally firing or failing. If the code was more simple, and less overloaded it would be easy to reason about all this, but it's not. The various find implementations span a few hundred lines, and all have numerous compile time branches and checks. The cyclomatic complexity must be huge.

How I'd do things:

- The first version of find would be the only one. No overloads, no specialisation. - substring would be a separate, non-predicated function. It would be specialised for strings. I'm too tired now to think about the variety of range specialisations, but I'm sure there's a more elegant way to handle to combinations.
- Drop the variadic find altogether.
- Get rid of BoyerMooreFinder + find overload, replace with a boyerMoore function.

Reply via email to