Folks, I'm aware of the definition of UTF8. Nim strings are stored as arrays of
8-bit values, whatever you want to call them. In fact, when you index a string,
you get the 8-bit value, not the unicode character. Given those facts, what
astonishes me is the difficulty of zipping two strings interpreted as ASCII, or
even as 8-bit integers.
In my case, all the values are ASCII, so UTF8 is precisely the 8-bit character.
That's why I do not care about encodings. I like the `runes` function, but I
don't see why I cannot call `toSeq(string)` to get a sequence of 8-bit numbers
-- char, or uint8, or something like that.
**@OderWat**, very interesting. Thanks. Is that equivalent to (but less
efficient than) converting each of `runes()` to a string?
**@wiffel**, thanks. That works. But do I really need `map`?
These also work:
proc charSeq(s: string): seq[char] =
result = newSeq[char](s.len)
for i in 0 .. s.high:
result[i] = s[i]
a = charSeq(dna_norm)
b = charSeq(dna_comp)
rcmap = sequtils.zip( a, b )
iterator charYield(s: string): char {.inline.} =
for i in 0 .. s.high:
yield s[i]
a = sequtils.toSeq(charYield(dna_norm))
b = sequtils.toSeq(charYield(dna_comp))
rcmap = sequtils.zip( a, b )
I kind of think that `charSeq` or `charYield` should be in the standard
library, since it is a common goal to convert a `string` to sqeuence of `char`.
And this is interesting, if we want an array:
template toArrayChars(s: string{`const`}): expr =
type
x = array[0..s.high, char]
var
res: x
for i in 0 .. s.high:
res[i] = s[i]
res
var
a = toArrayChars(dna_norm)
b = toArrayChars(dna_comp)
That uses _Parameter Constraints_. I'm starting to see the power of Nim.