Am 03.03.2010 02:03, schrieb luigi scarso:
On Tue, Mar 2, 2010 at 8:15 PM, Stephan Hennig<[email protected]>  wrote:
Am 02.03.2010 18:25, schrieb luigi scarso:
On Tue, Mar 2, 2010 at 4:28 PM, Stephan Hennig<[email protected]>
  wrote:

  While the latter two functions in general
{\it are} \UNICODE|-|aware, they fall-back to non|-|\UNICODE|-|aware
behaviour when using the empty capture \lua{()} (other captures work as
expected).

Hm, I don't understand this.

Neither do I. :)  SCNR
I mean: you said that empty capture is not unicode-aware
but others are ok (about match an gmatch)
Can you make a small example  ?

I wanted to mail you off-list, anyway. It was just late yesterday. Here is an example:

str = "ä#Ö"
print("str: ", str)

-- This considers 'Ö' a single upper-case letter, i.e.,
-- 'Ö' is one (character) long.
print('match("%u"): ', unicode.utf8.match(str, "(%u)"))
-- Like len does.
print('len("Ö"): ', unicode.utf8.len("Ö"))

-- This returns the byte position of 'Ö' in the string, i.e.,
-- it considers the length of 'ä' as two (bytes).
print('match("()%u"): ', unicode.utf8.match(str, "()%u"))
-- Unlike len.
print('len("ä"): ', unicode.utf8.len("ä"))

>texlua empty.lua
str:    ä#Ö
match("%u"):    Ö
len("Ö"):      1
match("()%u"):  4
len("ä"):      1

Note, the empty capture () doesn't return a match, but its position within a string in case of a match, similar to find. So, no surprise it returns byte positions. But one can argue, if that is documented behaviour.

Best regards,
Stephan Hennig

Reply via email to