On 12/12/2013 10:59 AM, Paul Isambert wrote:
De: "Patrick Gundlach" <[email protected]>
space
not a space
that was the easy part... Now the question is "why"... (Its clear
when you add anchors ^ and $ to the pattern).
I'll admit I don't get it. When I saw that
unicode.utf8.match("à", "%s")
returned true, I thougt: "à" is "C3 A0" in UTF-8, but Lua knows about latin-1
only, and "A0" is the non-breaking space, hence the false positive. And then,
of course: but isn't unicode.utf8.match() supposed to know about UTF-8? What
good is it if it can't spot a multibyte character?
Then I tried
string.match("à", "%s")
and it returned false, meaning actually the non-breaking space isn't
recognized by "%s", so my first explanation was wrong anyway.
I may be missing something here, being quite tired, but it seems to me
slnunicode is buggy or what?
it's some optimization (i remember noticing similar things) ... i think
that "%s" becomes a quick and dirty match for without looking at each
character as utf
if match("xà","x%s") then
print("space")
else
print("not a space")
end
works as expected
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------