Follow-up Comment #6, bug #15377 (project freeciv): The situation is pretty bad.
The character functions in support.c are used in a lot of places, some legit (like parsing of the registry files or capabilities) and some of which are on utf-8 strings which is very wrong. More problematic, they are used in functions like fcstrcasecmp, remove_leading_trailing_spaces, and so on. These in turn are used in some legit locations (again, registry file parsing) but in other places are used on utf-8 strings. For instance fcstrcasecmp is used on player names which may be utf-8, but it compares by going byte-by-byte and lower-casing each byte. This is very, very wrong (though thankfully wont cause a crash; only functions that modify strings are likely to cause major problems). I really see no clear way out of this problem. tolower can't be used on utf-8 strings with any validity, and without it a fcstrcasecmp function would be extremely challenging, to say the least. Elsewhere going function-by-function it is extremely hard to know which strings are utf-8, and which are straight ascii, even within the registry code. Unless we're willing to rework all the core code to use UCS2 or UCS4 as the internal encoding, I dont think it's possible to ensure bug-free behavior. The best we can do is fix places on a case-by-case basis when we encounter problems. _______________________________________________________ Reply to this item at: <http://gna.org/bugs/?15377> _______________________________________________ Message sent via/by Gna! http://gna.org/ _______________________________________________ Freeciv-dev mailing list Freeciv-dev@gna.org https://mail.gna.org/listinfo/freeciv-dev