`mike-port-encodings' branch (bug #29643)

Ludovic Courtès Tue, 08 Jun 2010 15:29:31 -0700

Hello Mike,

Thanks for working on this!


Here’s a review of the ‘mike-port-encodings’ branch.

"Michael Gran" <[email protected]> writes:

> commit 26e8fd7459581f09857cccc8ddf1c366d2caea87
> Author: Michael Gran <[email protected]>
> Date:   Sat Jun 5 13:40:32 2010 -0700
>
>     Simplify scm_i_scan_for_encoding
>     
>     Note especially that the variable 'i' has two different uses in this
>     function, and they get confused.
>     
>     * libguile/read.c (scm_i_scan_for_encoding): cleanup

-  while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE 
-         && pos + i - header < bytes_read
-        && (isalnum ((int) pos[i]) || strchr ("_-.:/,+=()", pos[i]) != NULL))
+  while (encoding_start + i - header <= SCM_ENCODING_SEARCH_SIZE
+         && encoding_start + i - header < bytes_read
+        && (isalnum ((int) encoding_start[i])
+            || strchr ("_-.:/,+=()", encoding_start[i]) != NULL))

Hmm, “simplification”?  :-)

Is this commit really needed?  What’s the operational effect?

> commit 44c705d173d52cb485d761a51425b4d8e310a67a
> Author: Michael Gran <[email protected]>
> Date:   Sat Jun 5 14:26:14 2010 -0700
>
>     read-line should use port's encoding, not locale's encoding
>     
>     * libguile/rdelim.c (scm_read_line): modified, use port's encoding
>

OK.  Would it be possible to write a test case, perhaps with a string
port?

+         line = scm_from_stringn (s, slen-1, enc, hndl);

Please leave spaces around binary operators.

> commit 62abf139798af3f35ca3c89200389dc75d51a159
> Author: Michael Gran <[email protected]>
> Date:   Sat Jun 5 14:19:37 2010 -0700
>
>     open-file should handle binary mode and coding declarations
>     
>     The open-file port should use the binary-friendly ISO-8859-1 encoding when
>     a file is opened using mode "b".  Also, it should honor a "coding:"
>     declaration at the top of a file when reading files where it is present.
>     
>     * libguile/fports.c (scm_open_file): modified
>     * test-suite/tests/ports.test: more tests for open-file
>     * doc/ref/api-io.texi (File Ports): more documentation for open-file

  --- a/doc/ref/api-io.texi
  +++ b/doc/ref/api-io.texi
  @@ -130,8 +130,8 @@ this fluid otherwise.

   @deffn {Scheme Procedure} port-encoding port
   @deffnx {C Function} scm_port_encoding
  -Returns, as a string, the character encoding that @var{port} uses to
  -interpret its input and output.
  +Returns, as a string, the character encoding that @var{port} uses to 
interpret
  +its input and output.  The value @code{#f} is equivalent to 
@code{"ISO-8859-1"}.

Instead of having the #f special case, I’ve been thinking that it would
be nicer and easier to have ‘port-encoding’ always return a string
(similar for ‘set-port-encoding!’), which would be "ISO-8859-1" instead
of #f.  Perhaps it’s a bit late, though.

What do you think?

  +Also, open the file using the binary-compatible character encoding
  +"ISO-8859-1", ignoring any coding declaration or port encoding.

I dislike this whole notion of “binary-compatibility”.  What is meant
here is that for binary I/O, you’d better choose ISO-8859-1 as the
port’s encoding because non-ASCII byte values will go through as is,
right?  (I suppose that’s true of any 8-bit encoding.)

OTOH, the (rnrs io ports) API does binary I/O as expected regardless of
the port’s encoding, so that’s what users should choose when doing
binary I/O, I think.

Thus I’d suggest removing any references to “binary-compatibility”.
 
  +When the file is opened, this procedure will scan for a coding
  +declaration (@pxref{Character Encoding of Source Files}). If present
  +will use that encoding for interpreting the file.  Otherwise, the
  +port's encoding will be used.

How about adding: “, which defaults to the value of
@code{%default-port-encoding} (@pxref{Ports,
@code{%default-port-encoding}}).”

  +the file in binary mode and then set the port encoding explicitly
  +using @code{set-port-encoding!}.

Again, there’s no real binary mode.

              "The following additional characters can be appended:\n"
              "@table @samp\n"
              "@item b\n"
  -         "Open the underlying file in binary mode, if supported by the 
operating system. "
  +         "Open the underlying file in binary mode, if supported by the 
system.\n"
  +         "Also, open the file using the binary-compatible character 
encoding\n"
  +         "\"ISO-8859-1\", ignoring the port's encoding and the coding 
declaration\n"
  +         "at the top of the input file, if any.\n"

OK, except for “binary compatible”.

  +;;; binary mode ignores port encoding
  +(with-fluids ((%default-port-encoding "UTF-8"))
  +  (let* ((filename (test-file))
  +         (port (open-file filename "w"))
  +         (test-string "一二三")
  +         (binary-test-string
  +          (apply string
  +                 (map integer->char
  +                      (uniform-vector->list (string->utf8 test-string))))))
  +    (write-line test-string port)
  +    (close-port port)
  +    (let* ((in-port (open-file filename "rb"))
  +           (line (read-line in-port)))
  +      (close-port in-port)
  +      (pass-if "file: binary mode ignores port encoding"
  +               (string=? line binary-test-string)))))

Rather put all the test under ‘pass-if’ so that errors are attributed to
this test.  There should also be a (delete-file filename) as the last
thunk of a ‘dynamic-wind’ (otherwise this could break ‘make distcheck’.)

Other than that the tests look good to me.

Thanks,
Ludo’.

`mike-port-encodings' branch (bug #29643)

Reply via email to