[Gnash-dev] Character encoding design

Benjamin Wolsey Sat, 18 Oct 2008 04:02:01 -0700

The present implementation of utf8 character handling is not very good.
Strings are kept as utf8 std::strings, and (often completely) converted
to std::wstring for every string operation, then sometimes back again
for operations that changed the string.


The second problem is that the standard library support for case
conversion requires a locale. We don't (and can't) know what locales are
installed on a machine, and we can only guess whether the locale Gnash
is run under is capable of case conversion for non-ascii characters.

Finally, embedded systems sometimes save space by not implementing
wstring.

We need to provide a way of doing this better, and preferably a modular
way so that the implementation can change easily.

There are notes at http://wiki.gnashdev.org/CharacterEncoding

I favour using a custom string class in libcore only, because it can
solve all the problems most efficiently. The disadvantage is that it
requires changes to the core (mostly minor changes, but lots of them). I
have tested such an implementation locally and it works (including lots
more swfdec testsuite passes) except for limitations imposed by libICU's
correctness.

The other difficulty is finding a way of enforcing the interface. Using
dynamic polymorphism seems to be too much of a cost for a class that is
used very frequently, when it should be decided at compile time. Static
polymorphism isn't particularly easy to achieve.

I'd be grateful for any thoughts on the design.

bwy

signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil

_______________________________________________
Gnash-dev mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnash-dev

[Gnash-dev] Character encoding design

Reply via email to