[luatex] problem with slnunicode's find

Stephan Hennig Mon, 01 Mar 2010 10:23:39 -0800

Hi,

I have trouble getting the position of a character in a UTF-8 stringwith slnunicode. The attached Lua script reads two UTF-8 encoded (Ithink) strings, 'äb' and 'öäb', from a file and outputs their length andthe position of the last character 'b'. (UTF-8 characters are scrambledin the output, because this is on a Windows console. But that shouldn'tharm, should it?)


> >texlua slnunicode-find.lua
> line = ├ñb
> len(line) = 2
> character 'b' at position 3
>
> line = ├Â├ñb
> len(line) = 3
> character 'b' at position 5

I would expect the positions of 'b' being 2 and 3, resp., as that arethe lengths of the strings as returned by unicode.utf8.len. However,unicode.utf8.find seems to have another notion of the length of astring. To correct these values manually (apparently the bytepositions) one needed to know how many of the characters preceding 'b'are multiple bytes long. Actually, I thought, that is what slnunicodeis made for.

What is the preferred way to get the position of a character in a UTF-8string, given a string contains only 'letters'?


Best regards,
Stephan Hennig

>texlua -v
This is LuaTeX, Version beta-0.40.6-2009110118 (Web2C 2009) luatex.web >= v14240

Ã¤b
Ã¶Ã¤b

io.input("words.utf8")
for line in io.lines() do
   print("line = " .. line)
   print("len(line) = " .. unicode.utf8.len(line))
   print("character 'b' at position " .. unicode.utf8.find(line, "b"))
   print()
end

[luatex] problem with slnunicode's find

Reply via email to