On 12/12/2013 10:59 AM, Paul Isambert wrote:
De: "Patrick Gundlach" <[email protected]>
space
not a space

that was the easy part... Now the question is "why"... (Its clear
when you add anchors ^ and $ to the pattern).

I'll admit I don't get it. When I saw that

     unicode.utf8.match("à", "%s")

returned true, I thougt: "à" is "C3 A0" in UTF-8, but Lua knows about latin-1
only, and "A0" is the non-breaking space, hence the false positive. And then,
of course: but isn't unicode.utf8.match() supposed to know about UTF-8? What
good is it if it can't spot a multibyte character?

Then I tried

     string.match("à", "%s")

and it returned false, meaning actually the non-breaking space isn't
recognized by "%s", so my first explanation was wrong anyway.

I may be missing something here, being quite tired, but it seems to me
slnunicode is buggy or what?

it's some optimization (i remember noticing similar things) ... i think that "%s" becomes a quick and dirty match for without looking at each character as utf

if match("xà","x%s") then
    print("space")
else
    print("not a space")
end

works as expected

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
    tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

Reply via email to