On Sat, 12 Feb 2011 19:07:43 +0100
Juan Jose Garcia-Ripoll <juanjose.garciarip...@googlemail.com> wrote:
> Thanks for the detailed report. I made some changes.
>
> * The exported symbols come from the EXT package. They are
Indeed, SI and EXT appear to be aliases; however when a condition type
is printed, SI appears to take precedence over EXT, so for instance:
decoding error on stream #<input stream "/tmp/InvalidUTF8.txt">
(:EXTERNAL-FORMAT (:UTF-8 :LF)):
the octet sequence (233 99) cannot be decoded.
[Condition of type SI:STREAM-DECODING-ERROR]
Of course that's a detail, though. I see that the symbol is now
extern, nice.
> * Two restarts are provided USE-VALUE and CONTINUE. They can be used via the
> ANSI functions with the same name (I think you missed that point regarding
> USE-VALUE)
Indeed, I hadn't realized about ANSI USE-VALUE at first, until my
second post. I indeed now see a CONTINUE restart as well.
> * I am not likely to provide multi-character restarts for a simple reason:
> ECL's streams are too simple, not providing arbitrary push-back buffers for
> bytes. Having a USE-VALUE restart that returns more than one character may
> lead to unexpected problems with unread-char and other functions -- I do not
> mean it is impossible but it simply complicates the interface and right now
> I have no clear idea how to do that.
I agree that it's unnecessary, as long as the code can obtain the
invalid sequences and resume reading at that point it should be fine.
So I gave a quick try at the new changes; it's much better, although a
character is still getting lost after the CONTINUE restart, even if I
consume all bytes from the invalid octets supplied. New test code
attached. Also, in theory, there's a single invalid byte in a row in
that stream, while there are two supplied invalid octets per occurance,
but that's a detail if the CONTINUE restart doesn't lose bytes.
Thanks,
--
Matt
Ceci est une phrase écrite en Français utilisant l'encodage UTF-8.
Ceci est une phrase écrite en Français utilisant l'encodage ISO-8859-15.
(defun custom-read-line (stream &key (max 512))
(let ((line (make-array max
:element-type 'character
:adjustable t
:fill-pointer 0)))
(macrolet ((add-char (c)
`(vector-push ,c line)))
(flet ((finalize-line ()
(loop
for c = (vector-pop line)
while (member c '(#\Return #\Newline))
finally (vector-push c line))
line))
(loop
do
(let (
;; No way to determine invalid octet values with old ECL,
;; Return an unknown character code
(c #+old-ecl(handler-case
(read-char stream)
(simple-error ()
#\UFFFD))
;; SBCL provides invalid octets which we can import and
;; then issue an ATTEMPT-RESYNC restart to resume
#+sbcl(handler-bind
((sb-int:stream-decoding-error
#'(lambda (e)
;; Treat invalid UTF-8 octets as
;; ISO-8859 characters.
(mapcar #'(lambda (c)
(when (> c 127)
(add-char (code-char c))))
(sb-int:character-decoding-error-octets e))
(invoke-restart 'sb-int:attempt-resync))))
(read-char stream))
;; Test with new ECL
#+ecl(handler-bind
((ext:stream-decoding-error
#'(lambda (e)
(format t "~A~%"
(mapcar #'restart-name
(compute-restarts e)))
;; Treat invalid UTF-8 octets as
;; ISO-8859 characters.
(mapcar #'(lambda (c)
(format t "~%~A~%" c)
(add-char (code-char c)))
(ext:character-decoding-error-octets e))
(invoke-restart 'continue))))
(read-char stream))))
(when (char= #\Newline c)
(return (values (finalize-line) t)))
(add-char c)))))))
(defun test ()
(with-open-file (stream "/tmp/InvalidUTF8.txt")
(loop
do
(let ((line (handler-case
(custom-read-line stream)
(end-of-file ()
(loop-finish)))))
(format t "~A~%" line)))))
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list