On Sun, 30 Jan 2011 17:27:05 -0500
Matthew Mondor <mm_li...@pulsar-zone.net> wrote:
> On Sun, 30 Jan 2011 23:25:21 +0100
> Juan Jose Garcia-Ripoll <juanjose.garciarip...@googlemail.com> wrote:
>
> > stream-{en,de}coding-error-stream
> > stream-{en,de}coding-error-external-format
> > stream-encoding-error-stream-code
> > stream-decoding-error-stream-octets
>
> Awesome, I'm looking forward to test the new code soon.
Today I tried the new interface. I had a few problems:
- The error signaled appears to be si::stream-decoding-error (internal
symbol)
- Somehow I get an error if trying to use the
stream-decoding-error-stream-octets accessor. I tried both
si:stream-decoding-error-stream-octets and
si::stream-decoding-error-stream-octets but get an unknown function
error. However, it appears that the parent
si:character-decoding-error-octets can be invoked;
- The octets provided by si:character-decoding-error-octet appear to
have been tempered with. I obtain two octets per LATIN-1 character
found in the stream, and these octets are quite lower than the actual
values (for the character 0xe9, the returned two octets were (9 99)
- Although I did see a USE-VALUE restart in the debugger, I could not
find the corresponding restart function to execute from a HANDLER-BIND
- There appears to be missing a sync restart allowing to continue
reading without having to supply a use-value. For instance the
reader function might want to add itself every illegal octets which
are > 127 to its input, treating them like LATIN-1 or mapping the
octets to a special unicode range, and there could be more than one
octet in the sequence, in which case use-value cannot be used (unless
it can accept multiple characters). SBCL uses the
sb-int:attempt-resync restart for that
I attach a test/example case:
- InvalidUTF8.txt contains both UTF-8 and ISO-8859-15 text.
- custom-read-line.lisp contains a custom function to read a line,
which attempts to convert to LATIN-1 the invalid UTF-8 sequences as
an example. It shows how that is done using SBCL as well.
Perhaps I can eventually take more time to suggest diffs; but I wanted
to follow-up first in case I missed something obvious.
Thanks,
--
Matt
Ceci est une phrase écrite en Français utilisant l'encodage UTF-8.
Ceci est une phrase écrite en Français utilisant l'encodage ISO-8859-15.
(defun custom-read-line (stream &key (max 512))
(let ((line (make-array max
:element-type 'character
:adjustable t
:fill-pointer 0)))
(flet ((add-char (c)
(declare (type character c))
(vector-push c line))
(finalize-line ()
(let ((len (length line)))
(when (and (> len 0)
(char= #\Return (aref line (1- len))))
(vector-pop line)))
line))
(loop
do
(let (
;; No way to determine invalid octet values with old ECL,
;; Return an unknown character code
(c #+old-ecl(handler-case
(read-char stream)
(simple-error ()
#\UFFFD))
;; SBCL provides invalid octets which we can import and
;; then issue an ATTEMPT-RESYNC restart to resume
#+sbcl(handler-bind
((sb-int:stream-decoding-error
#'(lambda (e)
;; Treat invalid UTF-8 octets as
;; ISO-8859 characters.
(mapcar #'(lambda (c)
(when (> c 127)
(add-char (code-char c))))
(sb-int:character-decoding-error-octets e))
(invoke-restart 'sb-int:attempt-resync))))
(read-char stream))
;; Test with new ECL
#+ecl(handler-bind
((si::stream-decoding-error ; Internal
#'(lambda (e)
(mapcar #'(lambda (c)
(format t "~%~A~%" c)
(when (> c 127)
;; Never happens
(add-char (code-char c))))
;; Not advertized interface?
(si:character-decoding-error-octets
e))
;; No restart function?
;; USE-VALUE not found, ATTEMPT-RESYNC
;; either...
;(invoke-restart 'si:use-value #\?)
)))
(read-char stream)))
)
(when (char= #\Newline c)
(return (values (finalize-line) t)))
(add-char c))))))
(defun test ()
(with-open-file (stream "/tmp/InvalidUTF8.txt")
(loop
do
(let ((line (handler-case
(custom-read-line stream)
(end-of-file ()
(loop-finish)))))
(format t "~A~%" line)))))
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list