Re: [Fwd: The design of Encoding class]

Asger Alstrup Nielsen Wed, 31 Mar 1999 16:40:26 -0500
> But I still don't understand you want to choice LString instead of C++
> string here. Because by using C++ string, it's optimal for any encoding.
> By using LString, it's a little bit overhead for adapting 16bit LString.

The reason to chose LString is to keep things uniform.  We don't want to manage
different kinds of strings according to what encoding we are reading.

Notice that we might have to read some files that are in a wide encoding.

> Ummm......, you may right, but I want to avoid using Unicode as middle layer
> if possible.

Notice that it's entirely possible to only provide one way to Unicode from any
encoding.  If encoding A can be converted to encoding B, and encoding B can be
converted to Unicode, we automatically get conversion from encoding A to
Unicode.
So you typically only need to implement the conversion to Unicode for the fixed
width encoding.

> In conclusion, if I want to implement toUnicode and fromUnicode for any Asian
> language, I need about 200K memory to do this. However, as you say, this
action
> could be optimized by merging these two substep. We can translate from
variable
> encoding to fix width encoding by the following simple routine.

If you chose the same setup as the iso8859-x family encodings in the cvs tree,
you can get away with much less space:  The number of glyphs in the encoding
times three.
This kind of conversion is efficient enough for the purpose that it is meant
for.

> LString &EncBig5::toFixWidthEncoding(LString s)
> {
>     LString l("");
>     LString::const_iterator i;
> 
>     for(i=s.begin();i!=s.end();i++) {
>         if ((*i <0xa1)||(*i==0xff)) {
>             l += LChar((*i++<<8)|*i);
>          } else {
>             l += LChar(*i);
>          }
>     }
> }
>
> Similar routines can be written for Japanese, Korean and China, too. I
> suppose this is waht you say efficient converter. Using Unicode as middle
> layer is a beauty of theory, but ugly in pratice for Asian language. For
western
> language, it may be acceptable. This is why most Asian country don't like
> Unicode. It's not designed primarily for us.
> 
> Should we add these two methods into the definition of EncodingConverter?

Yes, we want something like this.  Please go ahead and implement the
corresponding Big5 file if you feel like.  Notice that S Miyata is working on
asian encodings as well, and you might want to coordinate things with him.

Greets,

Asger
Re: [Fwd: The design of Encoding class]

Reply via email to