Re: zipping strings

cdunn2001 Wed, 13 Jul 2016 00:45:03 +0200

Folks, I'm aware of the definition of UTF8. Nim strings are stored as arrays of 
8-bit values, whatever you want to call them. In fact, when you index a string, 
you get the 8-bit value, not the unicode character. Given those facts, what 
astonishes me is the difficulty of zipping two strings interpreted as ASCII, or 
even as 8-bit integers.


In my case, all the values are ASCII, so UTF8 is precisely the 8-bit character. 
That's why I do not care about encodings. I like the `runes` function, but I 
don't see why I cannot call `toSeq(string)` to get a sequence of 8-bit numbers 
-- char, or uint8, or something like that.

**@OderWat**, very interesting. Thanks. Is that equivalent to (but less 
efficient than) converting each of `runes()` to a string?

**@wiffel**, thanks. That works. But do I really need `map`?

These also work:
    
    
    proc charSeq(s: string): seq[char] =
      result = newSeq[char](s.len)
      for i in 0 .. s.high:
        result[i] = s[i]
    a = charSeq(dna_norm)
    b = charSeq(dna_comp)
    rcmap = sequtils.zip( a, b )
    
    
    
    iterator charYield(s: string): char {.inline.} =
      for i in 0 .. s.high:
        yield s[i]
    a = sequtils.toSeq(charYield(dna_norm))
    b = sequtils.toSeq(charYield(dna_comp))
    rcmap = sequtils.zip( a, b )
    

I kind of think that `charSeq` or `charYield` should be in the standard 
library, since it is a common goal to convert a `string` to sqeuence of `char`.

And this is interesting, if we want an array: 
    
    
    template toArrayChars(s: string{`const`}): expr =
      type
        x = array[0..s.high, char]
      var
        res: x
      for i in 0 .. s.high:
        res[i] = s[i]
      res
    var
      a = toArrayChars(dna_norm)
      b = toArrayChars(dna_comp)
    

That uses _Parameter Constraints_. I'm starting to see the power of Nim.

Re: zipping strings

Reply via email to