Re: Unicode, ports and encoding

Ludovic Courtès Tue, 17 Feb 2009 13:55:01 -0800

Hello!

Mike Gran <spk...@yahoo.com> writes:


> 1.  To move to a Unicode-enabled guile, text information needs to be
>     converted to an internal representation when read and converted
>     back to the locale when written.  Most reading and writing for
>     ports passes through scm_getc (input) and scm_lfwrite (output).
>     Conversion between locale strings and internal strings should
>     happen there.

One strategy could be to have a new C port API, e.g., roughly based on
R6RS', with transcoders and all, and somehow arrange to have the current
port "API" mapped to that new shiny API.  It might be a bit ambitious,
though.

>     This implies that a source code file should have syntax to
>     indicate its own encoding, if it is not ASCII.  Something akin to
>     the <?xml encoding="utf-8"?> line in HTML files.

One could imagine special treatment of, say, the first 10 lines of a
file, with the ability to recognize Emacs file variables like
"-*- coding: utf-8 -*-" and to change the current port transcoder
accordingly, something like that.

By default, which encoding is used by `read' would be determined by the
input port's encoder.

> 3.  The text encoding of a port needs to be associated with the port.
>     R6RS has the idea of transcoders for ports that require
>     conversion.  It is daunting, but, having played some ideas for a
>     few weeks, it seems that at least a subset of the transcoder
>     functionality needs to be implemented for this to make any sense.

Yes.

> I sent in my copyright assignment last week, so you should have it
> now.

Cool!

IIRC, the first step you suggested was the implementation of wide
string/char types.  Did you also work on this?

Thanks,
Ludo'.

Re: Unicode, ports and encoding

Reply via email to