On Tue, Mar 2, 2010 at 2:01 PM, Stephan Hennig <[email protected]> wrote: > Am 02.03.2010 07:49, schrieb Taco Hoekwater: > >> Luatex itself has an internal UTF-8 counting function. At some point >> (don't know when but before 1.0) the internal Unicode library will >> replace slnunicode, and I will make sure that it exports a counter as >> well. > > Good to know. For the time being this paragraph from the LuaTeX manual > >> Note: The string library functions find etc. are not Unicode-aware. >> In cases where this is required (i. e. because the pattern used for >> searching contains characters above code point 127), the >> corresponding functions from unicode.utf8 should be used. > > is a bit misleading, since just unicode.utf8.find is again not > Unicode-aware. The same applies for the empty capture () in match and > gmatch, BTW. The output of > > str = "abcde" > print(unicode.utf8.match(str, "()e")) > str = "Äabcde" > print(unicode.utf8.match(str, "()e")) > > is 5 and 7. The second one is obviously wrong. I believe 7 is ok, because in utf8 Äabcde is 7 octet long and unittest.c says NOTE: find positions are in bytes for all ctypes!
-- luigi
