On Friday, 19 May 2017 at 12:21:10 UTC, biocyberman wrote:
1. Why do we need to use assumeUnique in 'revComp0' and
'revComp3'?
D strings are immutable, so if I'd created the result array as a
string, I couldn't change the individual characters. Instead, I
create a mutable array, change the elements in it, then cast it
to immutable when I'm done. assumeUnique does that casting while
keeping other type information and arguably providing better
documentation through its name. Behind the scenes, it's basically
doing cast(string)result;
2. What is going on with the trick of making chars enum like
that in 'revComp3'?
By marking a symbol enum, we tell the compiler that its value
should be calculated at compile-time. It's a bit of an
optimization (but probably doesn't matter at all, and should be
done by the compiler anyway), and a way to say it's really,
really const. :p
Mostly, it's a habit I try to build, of declaring symbols as
const as possible, to make maintenance easier.
Bonus! Three more variations, all faster than revComp0:
string revComp4(string bps) {
const N = bps.length;
char[] result = new char[N];
for (int i = 0; i < N; ++i) {
switch(bps[N-i-1]) {
case 'A': result[i] = 'T'; break;
case 'C': result[i] = 'G'; break;
case 'G': result[i] = 'C'; break;
case 'T': result[i] = 'A'; break;
default: assert(false);
}
}
return result.assumeUnique;
}
string revComp5(string bps) {
const N = bps.length;
char[] result = new char[N];
foreach (i, ref e; result) {
switch(bps[N-i-1]) {
case 'A': e = 'T'; break;
case 'C': e = 'G'; break;
case 'G': e = 'C'; break;
case 'T': e = 'A'; break;
default: assert(false);
}
}
return result.assumeUnique;
}
string revComp6(string bps) {
char[] result = new char[bps.length];
auto p1 = result.ptr;
auto p2 = &bps[$-1];
while (p2 > bps.ptr) {
switch(*p2) {
case 'A': *p1 = 'T'; break;
case 'C': *p1 = 'G'; break;
case 'G': *p1 = 'C'; break;
case 'T': *p1 = 'A'; break;
default: assert(false);
}
p1++; p2--;
}
return result.assumeUnique;
}
revComp6 seems to be the fastest, but it's probably also the
least readable (a common trade-off).