Re: [fltk.general] LC_CTYPE, strcasecmp

Ian MacArthur Tue, 18 Oct 2011 14:57:16 -0700

On 18 Oct 2011, at 19:19, corvid wrote:
>> 
>> Maybe rather than setting the environment to tr_TR you should try
>> setting tr_UTF8 (or whatever the syntax is, I never remember) and see if
>> that plays out nicer, given that fltk will now be *trying* to work with
>> UTF8 encoded strings rather than tr_TR "codepage" style stuff?
>> 
>> Or am I talking nonsense again?
> 
> I mean that, suppose someone runs
> $ export LC_CTYPE=tr_TR [or tr_TR.UTF-8]
> $ dillo [or unittests]
> ..and setlocale(LC_CTYPE, "") pulls the tr_TR from the environment
> and sets the locale, and then strcasecmp("i", "I") in Turkish is
> nonzero, whereas strcasecmp("i", "İ") or strcasecmp("I", "ı") are zero.


OK.... Turkish is way off my beat, so I am probably missing the point, I don't 
know what would be considered normal.

For those following along at home, Turkish has two distinct forms of the letter 
I, one is dotted (always, even when capitalised as İ) the other is not-dotted 
(even when lower case, ı), and they make distinct (related) sounds.

(And that's all going to look like gibberish if your mail reader can't handle 
the glyphs I just wrote...)


Questions that might be relevant are...

What does strcasecmp() make of the ("i", "ı") or ("I", "İ") cases?
Presumably they are declared as non-matches too?
So it is not the case that is at issue here, but the fact that strcasecmp() 
thinks that dotted/non-dotted I letters are distinct?

Are they generally considered as distinct? What happens when a text that is in 
a non-Turkish Latin script is parsed, by a Turkish system?
In that case, it might be correct to parse i and I as equivalent (that's 
dotted-small-i and non-dotted-caps-I) since they probably are equivalent in the 
source language.
This is (I think) the use-case Corvid is thinking about...

Are there parallels in other languages that are pertinent?
How are O and Ö handled (or U and Ü I guess) in languages that use them?
Are they sorted as "the same" or as different letters?



Corvid - is there a specific use that's problematic for you?

From your earlier post, I take it that what is happening is that you are doing 
something like (dodgy pseudo-code...)

   - find html <token>
   - if strcasecmp(token, I) == 0 then start italic_mode
   - etc...

except under Turkish this fails because i != I in Turkish, whereas under most 
languages using Latin characters it would be?

Would a workaround of doing 

   - find html <token>
   - if strcmp(token, i) || strcmp(token, I) then start italic_mode
   - etc...

be totally out of the question? That at least should work for this particular 
case...




_______________________________________________
fltk mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk

Re: [fltk.general] LC_CTYPE, strcasecmp

Reply via email to