Re: [Materm-devel] A newcomer on the list + utf 8 support

Terminator Wed, 05 Dec 2007 21:55:25 -0800

On Dec 5, 2007 4:36 PM, Jehan <[EMAIL PROTECTED]> wrote:

> Oh I forgot to ask:
>
> >
> > The biggest issue from my point of view, is in screen.c. In the current
> > implementation, we assume each character is either 1 byte or 2 bytes
> (CJK
> > multichar). On screen, a 1-byte character only occupies the space of one
> > column and a 2-byte character occupies the space of exactly two columns.
> > Thus, if the window width is 80, it allows exactly 80 bytes of data
> > regardless of how many actual characters there are. In this case, for
> > each line, we allocated 80 bytes for the actual characters, and 80 words
> > for the rendering characteristics, such as color, bold font, CJK
> property
> > (first byte or second byte of a CJK character). But this mechanism will
> > be broken for UTF8 because each UTF8 character may use a different
> number
> > of bytes (from one to six), and its width on screen is also different!
> >
> > My suggestion (again, immature) is to redesign text_t and rend_t. For
> > text_t, we should allow multiple bytes. For rend_t, we should store the
> > actual width of the character on screen. The second is easy: only one
> > bit is needed if we assumes each character uses up to 2 columns. But
> > the first needs a careful consideration: if we simply define text_t as
> > an array of 6 bytes, we may end up wasting a huge amount of memory!
> >
>
> Why in the current implementation, 2 bytes character should occupy the
> space of 2 columns? I guess that is because you have many complicated
> characters (like the asian ones) in the 2 bytes (or more) character
> plans. And for these characters, you may need more place to display. But
> I think you can also have very small (or invisible even!) characters
> encoded with more than one byte.
> Or is it only some kind of approximative prediction, in order to get the
> big view before effectively displaying the strings (and then see their
> real size, potentially different from prediction)?
>
> The short answer is: in current code, two-byte character is guaranteed to
occupy two columns. In all CJK encodings being supported by mrxvt, each
CJK character is stored in two bytes and is displayed on screen as twice
wide as an ASCII character. No character uses more than two bytes or
occupies
more than two columns.


And please keep in mind that in an X terminal emulator, there is a tradition
to use fixed width column. This requres a fixed width (mono) font. In the
case of CJK, we must use a CJK font such that the width of each CJK
character
is exactly twice the width of ASCII font. Therefore, ASCII text and CJK text
can be displayed nicely together. If you take a look at encoding.h, you can
see this clearly: each encoding has two list of fonts - the NFONT_LIST_ENC
(for ASCII text) and MFONT_LIST_ENC (for CJK text). The width of MFONT is
exactly twice as the corresponding NFONT.

Hope this explains your question.

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4

_______________________________________________
Materm-devel mailing list
Materm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/materm-devel
mrxvt home page: http://materm.sourceforge.net

Re: [Materm-devel] A newcomer on the list + utf 8 support

Reply via email to