Hi Paul,
> I see; not consistent to me, but at least it explains some things ... except
> "slnunicode.sub("éî", 2, 2)" returns "î" and not the second byte of "é", so
> obviously there are exceptions. (Or did I get something wrong again?)
No, no exceptions. Just "." is "bytes" and everything else is "characters".
The idea is IMO the following:
* Lua strings can deal with binary data. With the functions find,match,gmatch
and gsub the . in the patterns is for "one byte", everything else (like %d)
uses character classes and is for characters (or digits or punctuation or…).
With this, you can parse any file format out there rather easily.
* slnunicode is a drop in replacement for the string functions. The proof for
me is that the function names and the arguments are identical.
* because it is a drop in replacement, it has to behave exactly like the
regular string functions, esp. regarding the binary data. That means . is one
byte and %x is a character class, but taking the utf8 byte sequence into
account.
* all functions that do not use patterns deal with utf8 byte sequences (i.e.
len,lower,reverse,sub,upper). Those functions that deal with patterns make the
distinction between utf8 bytes (character classes) and single bytes (the dot).
These are find, match, gmatch and gsub. I don't know about byte and char, but I
guess these belong to the former class, I don't want to look this up now.
And I don't think replacing slnunicode with something else is really necessary,
because if it isn't broken, don't fix it.
But I shut up now unless asked again.
Patrick