I am trying to parse a stream from a tcp connection.
I think the data is utf8, here is a sample
20 2d 20 c8 65 73 6b fd 20 72 6f 7a 68 6c 61 73
which when I print it I get:
- e s k r o z h l a s
^ ^
missing missing
there are two missing characters. Ok, bad UTF8 perhaps?
but when I try unicode(1) I see:
unicode c8 fd
È
ý
Is this 8 bit runes? (!)
Is there a name for such a thing?
Is this common?
Is it just MS code pages but the >0x7f values happen (designed to) to map onto
the same letters as utf8?
thanks in advance of useful suggestions ☺
-Steve