Am 03.03.2010 02:03, schrieb luigi scarso:
On Tue, Mar 2, 2010 at 8:15 PM, Stephan Hennig<[email protected]> wrote:
Am 02.03.2010 18:25, schrieb luigi scarso:
On Tue, Mar 2, 2010 at 4:28 PM, Stephan Hennig<[email protected]>
wrote:
While the latter two functions in general
{\it are} \UNICODE|-|aware, they fall-back to non|-|\UNICODE|-|aware
behaviour when using the empty capture \lua{()} (other captures work as
expected).
Hm, I don't understand this.
Neither do I. :) SCNR
I mean: you said that empty capture is not unicode-aware
but others are ok (about match an gmatch)
Can you make a small example ?
I wanted to mail you off-list, anyway. It was just late yesterday.
Here is an example:
str = "ä#Ö"
print("str: ", str)
-- This considers 'Ö' a single upper-case letter, i.e.,
-- 'Ö' is one (character) long.
print('match("%u"): ', unicode.utf8.match(str, "(%u)"))
-- Like len does.
print('len("Ö"): ', unicode.utf8.len("Ö"))
-- This returns the byte position of 'Ö' in the string, i.e.,
-- it considers the length of 'ä' as two (bytes).
print('match("()%u"): ', unicode.utf8.match(str, "()%u"))
-- Unlike len.
print('len("ä"): ', unicode.utf8.len("ä"))
>texlua empty.lua
str: ä#Ö
match("%u"): Ö
len("Ö"): 1
match("()%u"): 4
len("ä"): 1
Note, the empty capture () doesn't return a match, but its position
within a string in case of a match, similar to find. So, no surprise it
returns byte positions. But one can argue, if that is documented behaviour.
Best regards,
Stephan Hennig