Hello Mike, Thanks for working on this!
Here’s a review of the ‘mike-port-encodings’ branch. "Michael Gran" <[email protected]> writes: > commit 26e8fd7459581f09857cccc8ddf1c366d2caea87 > Author: Michael Gran <[email protected]> > Date: Sat Jun 5 13:40:32 2010 -0700 > > Simplify scm_i_scan_for_encoding > > Note especially that the variable 'i' has two different uses in this > function, and they get confused. > > * libguile/read.c (scm_i_scan_for_encoding): cleanup - while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE - && pos + i - header < bytes_read - && (isalnum ((int) pos[i]) || strchr ("_-.:/,+=()", pos[i]) != NULL)) + while (encoding_start + i - header <= SCM_ENCODING_SEARCH_SIZE + && encoding_start + i - header < bytes_read + && (isalnum ((int) encoding_start[i]) + || strchr ("_-.:/,+=()", encoding_start[i]) != NULL)) Hmm, “simplification”? :-) Is this commit really needed? What’s the operational effect? > commit 44c705d173d52cb485d761a51425b4d8e310a67a > Author: Michael Gran <[email protected]> > Date: Sat Jun 5 14:26:14 2010 -0700 > > read-line should use port's encoding, not locale's encoding > > * libguile/rdelim.c (scm_read_line): modified, use port's encoding > OK. Would it be possible to write a test case, perhaps with a string port? + line = scm_from_stringn (s, slen-1, enc, hndl); Please leave spaces around binary operators. > commit 62abf139798af3f35ca3c89200389dc75d51a159 > Author: Michael Gran <[email protected]> > Date: Sat Jun 5 14:19:37 2010 -0700 > > open-file should handle binary mode and coding declarations > > The open-file port should use the binary-friendly ISO-8859-1 encoding when > a file is opened using mode "b". Also, it should honor a "coding:" > declaration at the top of a file when reading files where it is present. > > * libguile/fports.c (scm_open_file): modified > * test-suite/tests/ports.test: more tests for open-file > * doc/ref/api-io.texi (File Ports): more documentation for open-file --- a/doc/ref/api-io.texi +++ b/doc/ref/api-io.texi @@ -130,8 +130,8 @@ this fluid otherwise. @deffn {Scheme Procedure} port-encoding port @deffnx {C Function} scm_port_encoding -Returns, as a string, the character encoding that @var{port} uses to -interpret its input and output. +Returns, as a string, the character encoding that @var{port} uses to interpret +its input and output. The value @code{#f} is equivalent to @code{"ISO-8859-1"}. Instead of having the #f special case, I’ve been thinking that it would be nicer and easier to have ‘port-encoding’ always return a string (similar for ‘set-port-encoding!’), which would be "ISO-8859-1" instead of #f. Perhaps it’s a bit late, though. What do you think? +Also, open the file using the binary-compatible character encoding +"ISO-8859-1", ignoring any coding declaration or port encoding. I dislike this whole notion of “binary-compatibility”. What is meant here is that for binary I/O, you’d better choose ISO-8859-1 as the port’s encoding because non-ASCII byte values will go through as is, right? (I suppose that’s true of any 8-bit encoding.) OTOH, the (rnrs io ports) API does binary I/O as expected regardless of the port’s encoding, so that’s what users should choose when doing binary I/O, I think. Thus I’d suggest removing any references to “binary-compatibility”. +When the file is opened, this procedure will scan for a coding +declaration (@pxref{Character Encoding of Source Files}). If present +will use that encoding for interpreting the file. Otherwise, the +port's encoding will be used. How about adding: “, which defaults to the value of @code{%default-port-encoding} (@pxref{Ports, @code{%default-port-encoding}}).” +the file in binary mode and then set the port encoding explicitly +using @code{set-port-encoding!}. Again, there’s no real binary mode. "The following additional characters can be appended:\n" "@table @samp\n" "@item b\n" - "Open the underlying file in binary mode, if supported by the operating system. " + "Open the underlying file in binary mode, if supported by the system.\n" + "Also, open the file using the binary-compatible character encoding\n" + "\"ISO-8859-1\", ignoring the port's encoding and the coding declaration\n" + "at the top of the input file, if any.\n" OK, except for “binary compatible”. +;;; binary mode ignores port encoding +(with-fluids ((%default-port-encoding "UTF-8")) + (let* ((filename (test-file)) + (port (open-file filename "w")) + (test-string "一二三") + (binary-test-string + (apply string + (map integer->char + (uniform-vector->list (string->utf8 test-string)))))) + (write-line test-string port) + (close-port port) + (let* ((in-port (open-file filename "rb")) + (line (read-line in-port))) + (close-port in-port) + (pass-if "file: binary mode ignores port encoding" + (string=? line binary-test-string))))) Rather put all the test under ‘pass-if’ so that errors are attributed to this test. There should also be a (delete-file filename) as the last thunk of a ‘dynamic-wind’ (otherwise this could break ‘make distcheck’.) Other than that the tests look good to me. Thanks, Ludo’.
