Ian wrote: > On 7 Dec 2012, at 02:39, corvid wrote: > > > I was just having a look at how that was all resolved (I was on vacation > > at the time, and paid very little attention to the goings-on)... > > > > I see that there's an fl_ascii_strcasecmp() now, and that it's used > > to check the schemes and something involving xft fonts, but why wasn't > > it applied more generally? For instance, I see that Fl_Help_View.cxx > > still has code like 'if (strcasecmp(buf, "I") == 0' when checking tags. > > I would not describe the issue as fixed, in any real sense - I'm not sure it > is readily "fixable" from within fltk anyway. > > For those following along at home, I'll just recap that the problem is that > in the Turkic (and closely related) locales, there are actually two types of > letter i/I, a dotted i and a non-dotted one... > > Now, in *most* locales, the non-dotted form only exists as the capitalisation > of the dotted form, but in the Turkic forms a small dotted i is capitalised > as a BIG dotted i, and the BIG non-dotted I has a lower case, non-dotted form. > > But... practically all the letter-case-handling tools in (well, any computer > language really...) e.g. toupper/tolower/strcasecmp/etc. assume that the > upper case form of i is I. > > So... in the Unicode there are specific codes for the Turkic-style upper and > lower case dotted/non-dotted I glyphs, and a Unicode aware version of > strcasecmp (for example) might actually do the Right Thing. > > However... it turns out that most (all?) tools, even in the Turkic locales, > just use the "ASCII" values for i and I, thus reducing to almost zero any > prospect of the code correctly figuring out which glyph to use for the upper > / lower case conversion... > > Outcome: I don't think this can be fixed in fltk alone, I suspect that more > "global" change might be required. > > I imagine that a program that needs to handle Turkic-style locales could > "sanitize" its input text by scanning through it, looking for any occurrences > of the ASCII i/I values and for each, determining from its context which type > of i it *should* be and replacing with the full Unicode value as appropriate. > For each letter i found, there are three possible outcomes: > > - It might be a Turkic dotted I > - It might be a Turkic non-dotted I > - It might actually be an ASCII "regular" I > > That third case *probably* covers the Fl_Help_View situation, where scanning > tags in an html view is probably assumed to be (essentially) ASCII text, not > localised text in another (e.g. Turkic) locale... > > Does that answer the question...?
Their 'i' and 'I' are just ordinary ascii 'i' and 'I'. Those don't have separate code points. As for whether a bit of code is in a context where it wants to think about text based on locale or in an ascii context, most cases will be unambiguous, and those can be fixed. FLTK does a small enough amount of str[n]casecmp()ing overall that...I'll go ahead and do some analysis... _______________________________________________ fltk mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk

