Re: Encoding issues

Andrew Dunbar Sat, 02 Nov 2002 19:22:18 -0800

 --- Christian Biesinger <[EMAIL PROTECTED]> wrote:
> Hi,

Hi Christian.


> so you may remember that some time ago, I checked in
> a patch to change the encoding that AP_DiskStringSet
> uses to whatever XAP_App::getDefaultEncoding uses
> (or something like that, can't remember what exactly
> I did :) ).
> 
> Anyway, it looks like this broke non-US-ASCII
> characters in the statusbar, because of this piece
> of code in ap_StatusBar.cpp, line 493, in
> AP_StatusBar::setStatusMessage(const char * pBuf,
> int redraw)
>       UT_UCS4_strcpy_char(bufUCS,pBuf);
> 
> That function just uses the encoding that the
> default constructor of mbtowc thinks is good as the
> source encoding. That seems to be ISO-8859-1 for me.
> However, due to the patch I mentioned above, that
> string is already in UTF-8.
> This means that the statusbar will not display
> special characters (like, but not limited to, german
> umlauts) correctly. Instead, it will show characters
> looking like undecoded UTF-8 (like ÃŒ)
> 
> So... the question is:
> What's the best way for fixing this?
> Should UT_UCS4_strcpy_char take an additional (maybe
> optional) argument, specifying the charset to
> convert from? AP_StatusBar would pass the result of
> XAP_App::getDefaultEncoding to it, and this would
> work...

There is only 1 way to fix this.  We are software
engineers here.  Guesswork is not and has not ever
been an integral part of what engineers do.  What we
should do is *find out* the *correct encoding* for
the destination we send a string to, *always*, and
use that encoding.  I've said it before and I'll say
it again, having a default constructor for mbtowc and
wctomb is just begging for bugs.  There should never
be a time when we convert an encoding without knowing
what encoding we want.  Would you go to a money
changer without knowing what currency or exchange rate
you want?  How on earth we're supposed to do better
than Microsoft when we leave these things open to
chance time after time is completely beyond me.
So maybe some people think encoding is a hard problem
-
in that case look through the code or ask on the list
before making code and committing it when it's all
based on guesswork.

Sorry I got into a rant (:

Now the encoding needed by the status bar will depend
on the OS.  There should be functions in the
EncodingManager these day to give the encoding of the
OS and the encoding of the GUI.  I think the GUI
encoding is currently covered by something like
defaultSystemEncoding.  Experience on XP code has
shown that the user often can set an encoding for
himself.  On Unix this is via $LANG environment
variable.  The system will usually have an encoding
it likes to use for its own stuff.  This varies from
system to system.  On QNX, BeOS, and OS X this seems
to be UTF-8.  On Windows this can be set in the
Control Panel right next to where the user can set
his preferred locale.  There are APIs to get both.
In a Win32 Unicode build (which we don't yet support
but which we need), this will always be UCS-2 or
UTF-16.

With the old Gnome and GTK, the GUI used an ISO
encoding, maybe depending on the default language.
With the new Gnome and GTK, the GUI *always* uses
UTF-8.  So the statusbar also must use UTF-8.
Perhaps it is now a good idea to add a new GUIEncoding
to the other encodings in the EncodingManager to make
it more obvious which one to use - especially since
it appears with new GTK/Gnome that it may be
different from the system encoding.

Sorry for grumbling.  We still have encoding problems
popping up relatively frequently and also have wrong
fixes going in fairly often.  I hope this has gone a
little way toward clearing up some of the confusion
and should at least shed light on solving this one
immediate problem.

Andrew Dunbar.  Mr i18n (:

> Other ideas?
> 
> (Should I put this in bugzilla instead?) 

=====
http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

Re: Encoding issues

Reply via email to