Re: [fltk.general] STR #2771 [Turkic locales and str(n)casecmp, toupper, tolower]

corvid Fri, 07 Dec 2012 09:49:31 -0800

Ian wrote:
> On 7 Dec 2012, at 02:39, corvid wrote:
> 
> > I was just having a look at how that was all resolved (I was on vacation
> > at the time, and paid very little attention to the goings-on)...
> > 
> > I see that there's an fl_ascii_strcasecmp() now, and that it's used
> > to check the schemes and something involving xft fonts, but why wasn't
> > it applied more generally? For instance, I see that Fl_Help_View.cxx
> > still has code like 'if (strcasecmp(buf, "I") == 0' when checking tags.
> 
> I would not describe the issue as fixed, in any real sense - I'm not sure it 
> is readily "fixable" from within fltk anyway.
> 
> For those following along at home, I'll just recap that the problem is that 
> in the Turkic (and closely related) locales, there are actually two types of 
> letter i/I, a dotted i and a non-dotted one...
> 
> Now, in *most* locales, the non-dotted form only exists as the capitalisation 
> of the dotted form, but in the Turkic forms a small dotted i is capitalised 
> as a BIG dotted i, and the BIG non-dotted I has a lower case, non-dotted form.
> 
> But... practically all the letter-case-handling tools in (well, any computer 
> language really...) e.g. toupper/tolower/strcasecmp/etc. assume that the 
> upper case form of i is I.
> 
> So... in the Unicode there are specific codes for the Turkic-style upper and 
> lower case dotted/non-dotted I glyphs, and a Unicode aware version of 
> strcasecmp (for example) might actually do the Right Thing.
> 
> However... it turns out that most (all?) tools, even in the Turkic locales, 
> just use the "ASCII" values for i and I, thus reducing to almost zero any 
> prospect of the code correctly figuring out which glyph to use for the upper 
> / lower case conversion...
>
> Outcome: I don't think this can be fixed in fltk alone, I suspect that more 
> "global" change might be required.
> 
> I imagine that a program that needs to handle Turkic-style locales could 
> "sanitize" its input text by scanning through it, looking for any occurrences 
> of the ASCII i/I values and for each, determining from its context which type 
> of i it *should* be and replacing with the full Unicode value as appropriate. 
> For each letter i found, there are three possible outcomes:
> 
> - It might be a Turkic dotted I
> - It might be a Turkic non-dotted I
> - It might actually be an ASCII "regular" I
> 
> That third case *probably* covers the Fl_Help_View situation, where scanning 
> tags in an html view is probably assumed to be (essentially) ASCII text, not 
> localised text in another (e.g. Turkic) locale...
> 
> Does that answer the question...?


Their 'i' and 'I' are just ordinary ascii 'i' and 'I'.
Those don't have separate code points.

As for whether a bit of code is in a context where it wants to think about
text based on locale or in an ascii context, most cases will be unambiguous,
and those can be fixed.

FLTK does a small enough amount of str[n]casecmp()ing overall that...I'll go
ahead and do some analysis...

_______________________________________________
fltk mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk

Re: [fltk.general] STR #2771 [Turkic locales and str(n)casecmp, toupper, tolower]

Reply via email to