On Friday, 19 May 2017 at 09:17:04 UTC, Biotronic wrote:
On Friday, 19 May 2017 at 07:29:44 UTC, biocyberman wrote:
[...]
Question about your implementation: you assume the input may
contain newlines, but don't handle any other non-ACGT
characters. The problem definition states 'DNA string' and the
sample dataset contains no non-ACGT chars. Is this an oversight
my part or yours, or did you just decide to support more than
the problem requires?
[...]
Firstly, thank you for showing me various solutions, and even
cool benchmark code. To answer you questions: Yes I assume the
input file would realistically contain newlines, even though the
problem does not care about them. I also thought about non-CATG
bases, but haven't taken care of those cases. In reality we
should deal with at least ambiguous bases (N).
I ran your code and also see that switch is faster than AA (i.e.
revComp0 is the fastest). And Stefan is right about this.
Some follow up questions:
1. Why do we need to use assumeUnique in 'revComp0' and
'revComp3'?
2. What is going on with the trick of making chars enum like that
in 'revComp3'?