Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475

Mike Gran Thu, 28 May 2009 12:57:49 -0700

> From: Andy Wingo <wi...@pobox.com>
> 
> Hi Mike,
> 
> > The reader could probably preprocess the file looking for where 
> > the text "coding: XXXXX" appears within a comment in the top dozen
> > lines of a source code file. Or perhaps a line that is explicitly
> > ";;;; #pragma coding: XXXXX" in the top few lines of a file.
> 
> This sounds almost sane to me. I think python has a standard for this:
> 
>   http://www.python.org/dev/peps/pep-0263/
> 
> This is complicated in Guile by #!. A reasonable thing would be to have
> the reader have a bit on whether it actually saw an expression yet or
> not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding.


Works for me.  I'll do that. 

Also, just for the record, it seems obvious that this character 
encoding pragma should only work on files, which is fine.  I think
that is the way it would work.  Once could imagine a use where
someone loaded code into a string and then passed it to scm_read()
for interpretation.  In this case, I think "coding: XXXX" or
whatever should not be interpreted.

scm_read() can't handle this on its own because it has no "state".
It is called once per expression.

This all means that grepping the coding is a true preprocessing
step, divorced from the reader.

--

While we're on the topic, here's some serious pedantry about it all.
Fascinating to me, of course.  Less so to others, I'm sure.  Feel
free to zone out...

I went back and forth on the idea as to whether each port should have
its own dedicated character encoding, or if it was okay to have a
single encoding for all ports in a thread.  I've been going with the
single-encoding plan because R6RS I/O ports have a strong API for
that, while legacy Guile port API does not consider it.  I've been
trying not to modify Guile API.

For backwards compatibility, if no locale or encoding is set, Guile
ports should still function exactly as before.  I don't want to break
anything.
 
The medium-term plan it that if a program wants to read/write data that
is not in its locale encoding, it should prefer R6RS ports.  If it 
wants to read/write data in its current locale and encoding, Guile 
ports or R6RS ports should handle that transparently.

The procedure scm_read is firm API and takes a port, which means 
that the s-expression it reads will be interpreted in the context of
the port's encoding.  It is the default reader.

But, if the reader is modified to take its character encoding from
the top of the file, then the reader can't use scm_read directly 
as it would use the port's encoding.

It isn't as simple as pushing the old encoding, interpreting 
under the file's encoding, and then popping the old encoding, because
the output to stdout and stderr would then appear in the file's
encoding and not the terminal's locale's encoding.

So it neads a new reader, scm_read_with_encoding() or some such.



-Mike Gran

Re: [Guile-commits] GNU Guile branch, string_abstraction2, updated. 823e444052817ee120d87a3575acb4f767f17475

Reply via email to