Re: [fltk.general] LC_CTYPE, strcasecmp

corvid Tue, 18 Oct 2011 16:05:09 -0700

Ian wrote:
> On 18 Oct 2011, at 19:19, corvid wrote:
> > I mean that, suppose someone runs
> > $ export LC_CTYPE=3Dtr_TR [or tr_TR.UTF-8]
> > $ dillo [or unittests]
> > ..and setlocale(LC_CTYPE, "") pulls the tr_TR from the environment
> > and sets the locale, and then strcasecmp("i", "I") in Turkish is
> > nonzero, whereas strcasecmp("i", "=C4=B0") or strcasecmp("I", "=C4=B1") =
> are zero.
>
> OK.... Turkish is way off my beat, so I am probably missing the point, I =
> don't know what would be considered normal.
>
> For those following along at home, Turkish has two distinct forms of the =
> letter I, one is dotted (always, even when capitalised as =C4=B0) the =
> other is not-dotted (even when lower case, =C4=B1), and they make =
> distinct (related) sounds.
>
> (And that's all going to look like gibberish if your mail reader can't =
> handle the glyphs I just wrote...)
>
>
> Questions that might be relevant are...
>
> What does strcasecmp() make of the ("i", "=C4=B1") or ("I", "=C4=B0") =
> cases?
> Presumably they are declared as non-matches too?


They should be.

> So it is not the case that is at issue here, but the fact that =
> strcasecmp() thinks that dotted/non-dotted I letters are distinct?
>
> Are they generally considered as distinct? What happens when a text that =
> is in a non-Turkish Latin script is parsed, by a Turkish system?
> In that case, it might be correct to parse i and I as equivalent (that's =
> dotted-small-i and non-dotted-caps-I) since they probably are equivalent =
> in the source language.
> This is (I think) the use-case Corvid is thinking about...

Yeah, when you know you're doing ASCII or something.

> Are there parallels in other languages that are pertinent?
> How are O and =C3=96 handled (or U and =C3=9C I guess) in languages that =
> use them?
> Are they sorted as "the same" or as different letters?

I'm replying through the website because I'm on the daily digest,
and I can't tell what they say because something (not dillo!) ate
them, but, assuming that those have accents or umlauts or something,
they should be different.

> Corvid - is there a specific use that's problematic for you?
>
> =46rom your earlier post, I take it that what is happening is that you =
> are doing something like (dodgy pseudo-code...)
>
>    - find html <token>
>    - if strcasecmp(token, I) =3D=3D 0 then start italic_mode
>    - etc...
>
> except under Turkish this fails because i !=3D I in Turkish, whereas =
> under most languages using Latin characters it would be?
>
> Would a workaround of doing=20
>
>    - find html <token>
>    - if strcmp(token, i) || strcmp(token, I) then start italic_mode
>    - etc...
>
> be totally out of the question? That at least should work for this =
> particular case...

With that example, I was trying to show a simple case of
Fl_Help_View.cxx itself breaking if given a tag with an 'i' in it.
But, for instance, there's also:
fl_set_fonts_xft.cxx:        if (strncasecmp(style, "Italic", 6) == 0)
fl_set_fonts_xft.cxx:        if (strncasecmp(style, "Oblique", 7) == 0)
Fl_get_system_colors.cxx:  if (scheme_ && !strcasecmp(scheme_, "plastic")) {
etc.
(if they are called after the setlocale(), anyway)
I imagine some of the uses of fl_tolower() and fl_toupper() might be
a problem as well.

_______________________________________________
fltk mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk

Re: [fltk.general] LC_CTYPE, strcasecmp

Reply via email to