Package: guile-3.0
Version: 3.0.5-4
Severity: normal

The guile-3.0(1) command-line interface is documented to take the filename
of a script to execute.  Here's how well it understands filenames:

$ echo '(display "perplexity") (newline)' > $'foo\xf4\x90\x80\x80bar.scm'
$ LC_ALL=de_DE.iso88591 guile-3.0 --no-auto-compile 
$'foo\xf4\x90\x80\x80bar.scm' 2>&1 | cat -v
perplexity
$ LC_ALL=de_DE.utf8 guile-3.0 --no-auto-compile $'foo\xf4\x90\x80\x80bar.scm' 
2>&1 | cat -v
;;; Stat of /home/zefram/tmp/g0/fooM-tM-^PM-^@M-^@bar.scm failed:
;;; Throw to key `decoding-error' with args `("scm_to_utf8_stringn" "invalid 
codepoint in string" 84 "/home/zefram/tmp/g0/foo\U110000bar.scm")'.
Backtrace:
           0 (primitive-load "/home/zefram/tmp/g0/foo\U110000bar.scm")

ERROR: In procedure primitive-load:
Throw to key `decoding-error' with args `("scm_to_utf8_stringn" "invalid 
codepoint in string" 84 "/home/zefram/tmp/g0/foo\U110000bar.scm")'.
$

In this example I've created a script files, I've told guile-3.0(1)
twice to run it, and on one of the two attempts it's signalled an error.
(The --no-auto-compile option doesn't affect the substantive result, I'm
just using it to avoid the noise that the caching system would otherwise
make.)  In the invocation that failed, one can see via strace(1) that
guile-3.0(1) doesn't use the filename for any file operations at all;
the error occurs at an earlier stage.  The error thus doesn't depend
on the file existing: if there is no file of that name then the same
error occurs.

This bug depends on the locale implied by the environment, and more
specifically on the character encoding nominated by the LC_CTYPE component
of the locale.  Nominating a locale that's not installed behaves the same
as nominating the C locale.  Part of my example above depends on having
the de_DE.utf8 and de_DE.iso88591 locales installed.  If you don't have
the specific locales that I used then you can get the same results as
me by substituting an installed locale that nominates the same encoding.

This bug occurs when the locale nominates the UTF-8 encoding and the
supplied filename includes what looks like UTF-8 encoding of a codepoint
outside the Unicode range.  The filename as a whole doesn't need to have
the syntax of UTF-8 encoding of text.  (A different bug, Bug#1064437,
occurs if this error is not hit and the filename does not have the syntax
of locale-nominated encoding of text.)

Preferably, guile-3.0(1) should use the script file specified on the
command line, regardless of locale.  It shouldn't be attempting to treat
the filename as encoded text.  Alternatively, if it cannot be made to
handle arbitrary filenames, then this limitation must be documented.

-zefram

Reply via email to