**Question/food-for-thought:**
What would be the most sensible way to go about support unicode values in a
lexer using `lexbase` / BaseLexer as its basis?
I mean, as far as I can tell, using `BaseLexer` we can actually load up our
string/input into a buffer and move through it byte-by-byte:
while true:
setLen(p.value, 0)
case p.buf[p.bufpos]
of someChar:
...
Run
But the whole thing becomes quite more complicated when this "char" we are
after, even if it's a single one, is a Unicode character, in which case I end
up testing for a series of bytes (like `p.buf[p.bufpos]` and
`p.buf[p.bufpos+1]`, etc)
Here's an example of what I'm talking about:
<https://github.com/arturo-lang/arturo/blob/master/src/vm/parse.nim#L945-L967>
...which looks rather ugly, plus not very easy to debug and reason about.
So, how would you go about it?