Re: [I18n] questions about convert functions

Ivan Pascal Thu, 11 Mar 2004 05:30:03 -0800

  Hi,

> I googled for "big5-E0" for some documents on this encoding method,
> but get nothing helpful. As my intention is to understand how the wheel
> is rolling, so let's leave big5 stuff behind.


It is a method that Emacs uses converting Big5 into CTEXT.  It divides a whole
Big5 codes range to two charsets with escape sequences '\033$)0' and '\033$)1'.
I don't know if it match any Chinese standards.
Thus the charsets records cs2/cs3 were added exactly for accepting Emacs's
CTEXT.
 
> I'm not so sure about the actual meaning of the following:
> 
> > never into cs2/cs3 charsets.  (Frankly speaking the wc_encoding values are
> > actually wrong in that locale description.  And if one tries to convert CTEXT
> > into WC directly and then draw that WC string it will fail, really.  But it's
> > a bug.  With MB or UTF8 strings it should not happen.)
> > 
> 1. What do you mean "wc_encoding values are wrong"?
> 2. WC means the internal WC format(with wc_encoding bit etc.) here,
> not the stdc WC, which is UCS4 actually. Am I right?

You remind me one important point.  Actually what WC means is controlled with
a special option in XLC_LOCALE 'use_stdc_env'.
If the option is False (default value) an internal representation is used.
It means that a converter takes bytes from mytibyte, packs them into a long
variable ('wc_shift_bits' defines how far each next byte of multibyte should
be shifted before putting it into wide char variable), and then adds the
'wc_encoding' value to that wide char.

The reverse converter (from WC to something) firstly figures out wc_encoding
bits (using 'wc_encoding_mask') and then searches the CharSet with such
wc_encoding.  Of course, with such scheme if some CTEXT segment was converted
through cs2/cs3 into WC the wide chars have wc_encoding from those chasets
(0x00010000 or 0x00020000).  And when a WC string is being converted into
something the converter finds that chars belong to cs2/cs3.

But if 'use_stds_env' is True (and it really is in that locale description)
another converter is used.  This converter firstly converts CTEXT into
multibyte and then calls libc's mbtowc() function for each char.  The reverse
converter firstly converts WC into multibyte using libc's wctomb() and uses
that multibyte string for next step conversions.

> 3. You have that conclusion for some operating test, or reading from
> source code?
I dug the converters source code some years ago and now just tell you what
I remember. :)

> As I comprehend, if CTEXT convert to WC directly, then the WC string remain
> the original charset information, which may be cs2/cs3. This will cause
> the failure in drawing text. Am I right?

Yes, you are right. 
I meant that.  But since you remind me about 'use_stds_env' I think that
wc_encoding values are just unused there and make no sense. :)

-- 
 Ivan U. Pascal         |   e-mail: [EMAIL PROTECTED]
   Administrator of     |   Tomsk State University
     University Network |       Tomsk, Russia
_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n] questions about convert functions

Reply via email to