Re: [luatex] problem with slnunicode's find

Stephan Hennig Mon, 01 Mar 2010 14:18:19 -0800

Am 01.03.2010 19:42, schrieb Patrick Gundlach:

I would expect the positions of 'b' being 2 and 3, resp., as that
are the lengths of the strings as returned by unicode.utf8.len.
However, unicode.utf8.find seems to have another notion of the
length of a string.


It is documented: (Well, sort of, you need to downlaod the slunicode
library and look into 'unittest'.)


Thanks for the pointer!

--      NOTE: find positions are in bytes for all ctypes!

> -- use ascii.sub to cut found ranges!

Hmm, neither do I want to cut something nor do I have a range available.I just want to count. Attached is my attempt of a utf8 aware findfunction based on the utf8 aware parts of slnunicode. Comments andimprovements are welcome!

--      this is a) faster b) more reliable


But leaves this simple case uncovered. :/

Best regards,
Stephan Hennig

function utf8_find(str, pattern, start)
   local len_pat = unicode.utf8.len(pattern)
   local s = unicode.utf8.sub(str, start)
   -- search for first occurence of pattern
   local s = unicode.utf8.match(s, "^.-" .. pattern)
   local fin = s and start + unicode.utf8.len(s) - 1
   return fin and fin - len_pat + 1, fin
end


function showMatches(s, pattern)
   io.write("pattern '" .. pattern .. "' at positions")
   local start, fin = 0, 0
   while true do
      start, fin = utf8_find(s, pattern, start + 1)
      if not start then break end
      io.write(" (" .. start .. "," .. fin .. ")")
   end
   io.write("\n")
end


io.input("words.utf8")
for line in io.lines() do
   print("line = " .. line)
   print("len(line) = " .. unicode.utf8.len(line))
   showMatches(line, "Ã¤")
   showMatches(line, "Ã¶")
   showMatches(line, "Ã¶Ã¶")
   print()
end

#bÃ¶Ã¶#bb#
Ã¶#Ã¤#Ã¶Ã¶Ã¶bbb#Ã¶Ã¶b##Ã¶

Re: [luatex] problem with slnunicode's find

Reply via email to