Follow-up Comment #6, bug #15377 (project freeciv):

The situation is pretty bad.

The character functions in support.c are used in a lot of places, some legit
(like parsing of the registry files or capabilities) and some of which are on
utf-8 strings which is very wrong.

More problematic, they are used in functions like fcstrcasecmp,
remove_leading_trailing_spaces, and so on.  These in turn are used in some
legit locations (again, registry file parsing) but in other places are used
on utf-8 strings.  For instance fcstrcasecmp is used on player names which
may be utf-8, but it compares by going byte-by-byte and lower-casing each
byte.  This is very, very wrong (though thankfully wont cause a crash; only
functions that modify strings are likely to cause major problems).

I really see no clear way out of this problem.  tolower can't be used on
utf-8 strings with any validity, and without it a fcstrcasecmp function would
be extremely challenging, to say the least.  Elsewhere going
function-by-function it is extremely hard to know which strings are utf-8,
and which are straight ascii, even within the registry code.

Unless we're willing to rework all the core code to use UCS2 or UCS4 as the
internal encoding, I dont think it's possible to ensure bug-free behavior. 
The best we can do is fix places on a case-by-case basis when we encounter
problems.

    _______________________________________________________

Reply to this item at:

  <http://gna.org/bugs/?15377>

_______________________________________________
  Message sent via/by Gna!
  http://gna.org/


_______________________________________________
Freeciv-dev mailing list
Freeciv-dev@gna.org
https://mail.gna.org/listinfo/freeciv-dev

Reply via email to