On Friday, 19 May 2017 at 12:21:10 UTC, biocyberman wrote:

1. Why do we need to use assumeUnique in 'revComp0' and 'revComp3'?

D strings are immutable, so if I'd created the result array as a string, I couldn't change the individual characters. Instead, I create a mutable array, change the elements in it, then cast it to immutable when I'm done. assumeUnique does that casting while keeping other type information and arguably providing better documentation through its name. Behind the scenes, it's basically doing cast(string)result;

2. What is going on with the trick of making chars enum like that in 'revComp3'?

By marking a symbol enum, we tell the compiler that its value should be calculated at compile-time. It's a bit of an optimization (but probably doesn't matter at all, and should be done by the compiler anyway), and a way to say it's really, really const. :p

Mostly, it's a habit I try to build, of declaring symbols as const as possible, to make maintenance easier.


Bonus! Three more variations, all faster than revComp0:

string revComp4(string bps) {
    const N = bps.length;
    char[] result = new char[N];
    for (int i = 0; i < N; ++i) {
        switch(bps[N-i-1]) {
            case 'A': result[i] = 'T'; break;
            case 'C': result[i] = 'G'; break;
            case 'G': result[i] = 'C'; break;
            case 'T': result[i] = 'A'; break;
            default: assert(false);
        }
    }
    return result.assumeUnique;
}

string revComp5(string bps) {
    const N = bps.length;
    char[] result = new char[N];
    foreach (i, ref e; result) {
        switch(bps[N-i-1]) {
            case 'A': e = 'T'; break;
            case 'C': e = 'G'; break;
            case 'G': e = 'C'; break;
            case 'T': e = 'A'; break;
            default: assert(false);
        }
    }
    return result.assumeUnique;
}

string revComp6(string bps) {
    char[] result = new char[bps.length];
    auto p1 = result.ptr;
    auto p2 = &bps[$-1];

    while (p2 > bps.ptr) {
        switch(*p2) {
            case 'A': *p1 = 'T'; break;
            case 'C': *p1 = 'G'; break;
            case 'G': *p1 = 'C'; break;
            case 'T': *p1 = 'A'; break;
            default: assert(false);
        }
        p1++; p2--;
    }
    return result.assumeUnique;
}

revComp6 seems to be the fastest, but it's probably also the least readable (a common trade-off).

Reply via email to