I've tracked down the cause of some encoding-related problems
encountered while using string literals containing special characters
(squared, cubed, and degree symbols) in source files encoded as
latin-1. I specify :external-format :latin-1 to compile-file, but the
resulting objects/fasls error at load-time during their
initialization, the error indicating an attempt to decode latin-1
characters as UTF-8. A peek at the .data file confirmed the characters
were not UTF-8 encoded in the compiler output.
I've attached a patch which fixes this by changing DATA-C-DUMP,
supplying :external-format :utf8 to WT-FILTERED-DATA when writing the
data string if unicode is enabled. I also changed the surrounding
WITH-OPEN-FILE to supply :external-format :latin-1, as a pun for
passthrough encoding. I'm sure this isn't necessary, but without the
assurances of using a binary stream (or knowledge of how
:external-format :default is interpreted throughout ECL), I wanted to
pin it down to a specific behavior.
I found another bug along the way, in the use of
sequence-output-streams in C::UTF8-ENCODED-STRING. If the output
vector has to be resized, the original vector is returned instead of
the newer, larger vector. I fixed it by making the vector adjustable,
so adjust-array calls replace-array and we get what we want. An
alternative would be using an interface like string output streams,
with a function to retrieve the accumulated result, but this way was a
one line change.
Here's an example of how it returns the wrong result if the initial
size estimate (* 1.2 length) is too short:
> (length (c::utf8-encoded-string (string (code-char 179))))
0 ; should be 2.
> (length (c::utf8-encoded-string (concatenate 'string (string (code-char 179))
> ".")))
2 ; should be 3.
> (length (c::utf8-encoded-string (concatenate 'string (string (code-char 179))
> "..")))
4 ; correct.
diff --git a/src/cmp/cmpc-wt.lsp b/src/cmp/cmpc-wt.lsp
index 47e0bb7..0670acf 100644
--- a/src/cmp/cmpc-wt.lsp
+++ b/src/cmp/cmpc-wt.lsp
@@ -127,6 +127,7 @@
(defun utf8-encoded-string (string)
(let* ((output (make-array (round (* 1.2 (length string)))
:element-type 'base-char
+ :adjustable t
:fill-pointer 0))
(stream (make-sequence-output-stream output :external-format :utf-8)))
(write-string string stream)
diff --git a/src/cmp/cmpwt.lsp b/src/cmp/cmpwt.lsp
index cc6ac37..6fa6186 100644
--- a/src/cmp/cmpwt.lsp
+++ b/src/cmp/cmpwt.lsp
@@ -84,13 +84,17 @@
(defun data-c-dump (filename)
(with-open-file (stream filename :direction :output :if-does-not-exist :create
- :if-exists :supersede :external-format :default)
+ :if-exists :supersede
+ :external-format :latin-1)
(let ((string (data-dump-array)))
(if (and *compile-in-constants* (plusp (length string)))
(let ((*wt-string-size* 0)
(*wt-data-column* 80))
(princ "static const char compiler_data_text[] = " stream)
- (wt-filtered-data string stream)
+ (wt-filtered-data string stream
+ :external-format
+ #+unicode :utf-8
+ #-unicode :default)
(princ #\; stream)
(format stream "~%#define compiler_data_text_size ~D~%"
*wt-string-size*))
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list