> From: Andy Wingo <wi...@pobox.com> > > Hi Mike, > > > The reader could probably preprocess the file looking for where > > the text "coding: XXXXX" appears within a comment in the top dozen > > lines of a source code file. Or perhaps a line that is explicitly > > ";;;; #pragma coding: XXXXX" in the top few lines of a file. > > This sounds almost sane to me. I think python has a standard for this: > > http://www.python.org/dev/peps/pep-0263/ > > This is complicated in Guile by #!. A reasonable thing would be to have > the reader have a bit on whether it actually saw an expression yet or > not. If not, "^;+ [^\n]*coding: ..." would set the file's encoding.
Works for me. I'll do that. Also, just for the record, it seems obvious that this character encoding pragma should only work on files, which is fine. I think that is the way it would work. Once could imagine a use where someone loaded code into a string and then passed it to scm_read() for interpretation. In this case, I think "coding: XXXX" or whatever should not be interpreted. scm_read() can't handle this on its own because it has no "state". It is called once per expression. This all means that grepping the coding is a true preprocessing step, divorced from the reader. -- While we're on the topic, here's some serious pedantry about it all. Fascinating to me, of course. Less so to others, I'm sure. Feel free to zone out... I went back and forth on the idea as to whether each port should have its own dedicated character encoding, or if it was okay to have a single encoding for all ports in a thread. I've been going with the single-encoding plan because R6RS I/O ports have a strong API for that, while legacy Guile port API does not consider it. I've been trying not to modify Guile API. For backwards compatibility, if no locale or encoding is set, Guile ports should still function exactly as before. I don't want to break anything. The medium-term plan it that if a program wants to read/write data that is not in its locale encoding, it should prefer R6RS ports. If it wants to read/write data in its current locale and encoding, Guile ports or R6RS ports should handle that transparently. The procedure scm_read is firm API and takes a port, which means that the s-expression it reads will be interpreted in the context of the port's encoding. It is the default reader. But, if the reader is modified to take its character encoding from the top of the file, then the reader can't use scm_read directly as it would use the port's encoding. It isn't as simple as pushing the old encoding, interpreting under the file's encoding, and then popping the old encoding, because the output to stdout and stderr would then appear in the file's encoding and not the terminal's locale's encoding. So it neads a new reader, scm_read_with_encoding() or some such. -Mike Gran