Re: [Materm-devel] A newcomer on the list + utf 8 support

Terminator Wed, 05 Dec 2007 08:47:10 -0800

Hi, Jey,

Welcome to the team!

On Dec 4, 2007 7:17 PM, Jehan <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I am new, so I present myself: my name is Jehan, alias Jey.
> I am interested in mrxvt because I have used it for some years now. But
> the utf8 support takes time to arrive, so I am sad. :-(
>

Yeah, same here. :-(

>
> Anyway then I was wondering if someone was working on it. If not, I
> would propose to "try". I never developed on terminal emulation, so this
> is a new topic for me. But as I really want to keep mrxvt and in the
> same time, I want utf8, I thought it could be nice to participate to the
> development. I won't promise to succeed, but at least I can try to.
>

That's exactly the attitude when I started the mrxvt project. And with your
and David Morris's help, we may achieve the same success again. :-)

> Then my question is: what is there to know about the code structure to
> facilitate my work? I have looked for the code for the last 2 hours. I
> have made some points about some parts but this remains misty about many
> of the functioning. I can of course work more on it to discover more of
> the program's logic, but I thought the easier was to have hints to where
> to look from the main developers. During the day I have a job, and the
> evening I have many activities. So the more help, the better. :-)
>

>From my understanding of an X terminal emulator (again, as incomplete as
you too :-)), here is how it works: the emulator accepts input from
keyboard (though X keyboard events), sends it to the slave process
(typically
a shell); it also accepts output from the slave process, and prints it on
the screen. So the terminal emulator must listen on two input channels: the
X keyboard event and the slave process's stdout/stderr, it redirects one
input channel (X keyboard event) to the slave process as well as echoes the
input on the screen, it then renders the data from the other input channel
(slave process) on the screen.

In reality, there may be some other inputs, like mouse event. Let's forget
about the for now.

GI, please feel free to correct me if you find anything wrong. :-)

> What I logically guess about a new encoding support is that I would have
> two steps to modify:
>
> 1/ when an input arrive (whether it is keyboard or text paste probably,
> through mouse, menu or whatever), and if the locale is set to utf8 of
> course, I shall transform some keyboard signal (?) or the pasted text to
> the utf-8 encoding and send it to the running program's stdin...
>

>From my understanding, the input should be in UTF8 in this case. We do not
need to do the translation.

> 2/ when some program needs to display anything to stdout, he does it in
> utf-8 (or I suppose so if the terminal encoding is set to utf8?), then I
> shall decode it and display the resulting unicode value in some unicode
> font.
>

I assume the output is in UTF8 also.

>
> Any correction is welcome, because it is only a guess about what seems
> some logic of what a terminal does.
>
> Then what are these functions which receive some input and have to
> transform it in the set encoding and the ones which receive an output
> and must decode it, then display it with unicode fonts?
>
> What I found in the code:
>
> - encoding.c: apparently defines the functions which take the locale,
> which decides what is the encoding from it and which font to use (all
> used during initialisation, init.c it seems)? Apparently there are also
> some conversion functions from one encoding to another... What are they
> for?
>

I think you do not need to care about the encoding.c here. It is mainly for
CJK encoding, which I consider obsolete now - though I am using it everyday.

> - command.c: rxvt_cmd_getc get the next input character, then
> rxvt_process_getc will process it and the following characters (as much
> as possible) until some escape sequence or a non-printable character
> appears.
>

I think  rxvt_cmd_getc handles both input channels. This seems not to be
a big issue for us now.

> - screen.c: then rxvt_scr_add_lines should probably do something with
> the input string... but then I get lost about how this all is processed...
>

The biggest issue from my point of view, is in screen.c. In the current
implementation, we assume each character is either 1 byte or 2 bytes (CJK
multichar). On screen, a 1-byte character only occupies the space of one
column and a 2-byte character occupies the space of exactly two columns.
Thus, if the window width is 80, it allows exactly 80 bytes of data
regardless of how many actual characters there are. In this case, for
each line, we allocated 80 bytes for the actual characters, and 80 words
for the rendering characteristics, such as color, bold font, CJK property
(first byte or second byte of a CJK character). But this mechanism will
be broken for UTF8 because each UTF8 character may use a different number
of bytes (from one to six), and its width on screen is also different!

My suggestion (again, immature) is to redesign text_t and rend_t. For
text_t, we should allow multiple bytes. For rend_t, we should store the
actual width of the character on screen. The second is easy: only one
bit is needed if we assumes each character uses up to 2 columns. But
the first needs a careful consideration: if we simply define text_t as
an array of 6 bytes, we may end up wasting a huge amount of memory!

Any ideas?

I think the following link provides a good reference and we should carefully
study it:

http://www.cl.cam.ac.uk/~mgk25/unicode.html#term<http://www.cl.cam.ac.uk/%7Emgk25/unicode.html#term>

All the best,

Jimmy

>
> Can you help me decoding this whole code logic please?
> Thanks.
>
> Jey
>
> P.S.: sorry for this long email as a beginning email (especially if
> someone is already taking care of utf8, which is great, and then my
> email useless), but as I have just looked in the code, I wanted to ask
> questions while it was fresh. Now let's go to sleep. I must go to work
> in a few hours...
>
>
> -------------------------------------------------------------------------
> SF.Net email is sponsored by: The Future of Linux Business White Paper
> from Novell.  From the desktop to the data center, Linux is going
> mainstream.  Let it simplify your IT future.
> http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
> _______________________________________________
> Materm-devel mailing list
> Materm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/materm-devel
> mrxvt home page: http://materm.sourceforge.net
>

-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4

_______________________________________________
Materm-devel mailing list
Materm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/materm-devel
mrxvt home page: http://materm.sourceforge.net

Re: [Materm-devel] A newcomer on the list + utf 8 support

Reply via email to