Hi,

I have trouble getting the position of a character in a UTF-8 string with slnunicode. The attached Lua script reads two UTF-8 encoded (I think) strings, 'äb' and 'öäb', from a file and outputs their length and the position of the last character 'b'. (UTF-8 characters are scrambled in the output, because this is on a Windows console. But that shouldn't harm, should it?)

> >texlua slnunicode-find.lua
> line = äb
> len(line) = 2
> character 'b' at position 3
>
> line = ├Â├ñb
> len(line) = 3
> character 'b' at position 5

I would expect the positions of 'b' being 2 and 3, resp., as that are the lengths of the strings as returned by unicode.utf8.len. However, unicode.utf8.find seems to have another notion of the length of a string. To correct these values manually (apparently the byte positions) one needed to know how many of the characters preceding 'b' are multiple bytes long. Actually, I thought, that is what slnunicode is made for.

What is the preferred way to get the position of a character in a UTF-8 string, given a string contains only 'letters'?

Best regards,
Stephan Hennig


>texlua -v
This is LuaTeX, Version beta-0.40.6-2009110118 (Web2C 2009) luatex.web >= v14240

äb
öäb
io.input("words.utf8")
for line in io.lines() do
   print("line = " .. line)
   print("len(line) = " .. unicode.utf8.len(line))
   print("character 'b' at position " .. unicode.utf8.find(line, "b"))
   print()
end

Reply via email to