Hello! Mike Gran <spk...@yahoo.com> writes:
> 1. To move to a Unicode-enabled guile, text information needs to be > converted to an internal representation when read and converted > back to the locale when written. Most reading and writing for > ports passes through scm_getc (input) and scm_lfwrite (output). > Conversion between locale strings and internal strings should > happen there. One strategy could be to have a new C port API, e.g., roughly based on R6RS', with transcoders and all, and somehow arrange to have the current port "API" mapped to that new shiny API. It might be a bit ambitious, though. > This implies that a source code file should have syntax to > indicate its own encoding, if it is not ASCII. Something akin to > the <?xml encoding="utf-8"?> line in HTML files. One could imagine special treatment of, say, the first 10 lines of a file, with the ability to recognize Emacs file variables like "-*- coding: utf-8 -*-" and to change the current port transcoder accordingly, something like that. By default, which encoding is used by `read' would be determined by the input port's encoder. > 3. The text encoding of a port needs to be associated with the port. > R6RS has the idea of transcoders for ports that require > conversion. It is daunting, but, having played some ideas for a > few weeks, it seems that at least a subset of the transcoder > functionality needs to be implemented for this to make any sense. Yes. > I sent in my copyright assignment last week, so you should have it > now. Cool! IIRC, the first step you suggested was the implementation of wide string/char types. Did you also work on this? Thanks, Ludo'.