Re: [fltk.general] STR #2771 [Turkic locales and str(n)casecmp, toupper, tolower]

Ian MacArthur Fri, 07 Dec 2012 03:26:20 -0800

On 7 Dec 2012, at 02:39, corvid wrote:

> I was just having a look at how that was all resolved (I was on vacation
> at the time, and paid very little attention to the goings-on)...
> 
> I see that there's an fl_ascii_strcasecmp() now, and that it's used
> to check the schemes and something involving xft fonts, but why wasn't
> it applied more generally? For instance, I see that Fl_Help_View.cxx
> still has code like 'if (strcasecmp(buf, "I") == 0' when checking tags.



I would not describe the issue as fixed, in any real sense - I'm not sure it is 
readily "fixable" from within fltk anyway.

For those following along at home, I'll just recap that the problem is that in 
the Turkic (and closely related) locales, there are actually two types of 
letter i/I, a dotted i and a non-dotted one...

Now, in *most* locales, the non-dotted form only exists as the capitalisation 
of the dotted form, but in the Turkic forms a small dotted i is capitalised as 
a BIG dotted i, and the BIG non-dotted I has a lower case, non-dotted form.

But... practically all the letter-case-handling tools in (well, any computer 
language really...) e.g. toupper/tolower/strcasecmp/etc. assume that the upper 
case form of i is I.

So... in the Unicode there are specific codes for the Turkic-style upper and 
lower case dotted/non-dotted I glyphs, and a Unicode aware version of 
strcasecmp (for example) might actually do the Right Thing.

However... it turns out that most (all?) tools, even in the Turkic locales, 
just use the "ASCII" values for i and I, thus reducing to almost zero any 
prospect of the code correctly figuring out which glyph to use for the upper / 
lower case conversion...

Outcome: I don't think this can be fixed in fltk alone, I suspect that more 
"global" change might be required.

I imagine that a program that needs to handle Turkic-style locales could 
"sanitize" its input text by scanning through it, looking for any occurrences 
of the ASCII i/I values and for each, determining from its context which type 
of i it *should* be and replacing with the full Unicode value as appropriate. 
For each letter i found, there are three possible outcomes:

- It might be a Turkic dotted I
- It might be a Turkic non-dotted I
- It might actually be an ASCII "regular" I

That third case *probably* covers the Fl_Help_View situation, where scanning 
tags in an html view is probably assumed to be (essentially) ASCII text, not 
localised text in another (e.g. Turkic) locale...

Does that answer the question...?
-- 
Ian



_______________________________________________
fltk mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk

Re: [fltk.general] STR #2771 [Turkic locales and str(n)casecmp, toupper, tolower]

Reply via email to