Re: [luatex] problem with slnunicode's find

Stephan Hennig Tue, 02 Mar 2010 10:34:27 -0800

Am 02.03.2010 17:18, schrieb luigi scarso:

On Tue, Mar 2, 2010 at 4:39 PM, Stephan Hennig<[email protected]>  wrote:

Am 02.03.2010 14:41, schrieb luigi scarso:


I believe 7 is ok, because in utf8 Äabcde is 7 octet long
and  unittest.c says
  NOTE: find positions are in bytes for all ctypes!


Logicians might be satisfied with broken behaviour as long as it's
documented.

I believe that it's not a broken behaviour, it's only  a mix from two
differents points of view:
"abstract" (or "sign"  or "glyph" o "character" ),  where we see Ä  as "unit"
and "implementation"  where Ä in utf8  is two octet.

Yes, that's why I call it "broken". Switching point of view within theunicode.utf8 functions doesn't seem a good design to me. I cannot seewhy it could be sensible to regard the length of Ä as one (character) inlen and two (octets) in find. After all, we already have function(s)that return byte positions in a strings, string.find orunicode.ascii.find. Why not drop unicode.utf8.find at all? That'd be aclear design. (Only beaten by a find function that regards Ä the samelength as len does. There are use-cases for such a find function.)

But I'm not a logician, so I cannot agree. :)

To be honest I'm not confortable with regex and unicode.

Perl can help here, but, just to see an example

#>  perl  -e '$str = "Äabcde"; print length($str),"\n" ;' ;
7
#>  perl  -e 'use utf8; $str = "Äabcde"; print length($str),"\n" ;' ;
6

Same with string.len and unicode.ut8.len in Lua. You made me curious.Is there a find function in Perl? What values does that return?


Best regards,
Stephan Hennig

Re: [luatex] problem with slnunicode's find

Reply via email to