I've tracked down the cause of some encoding-related problems
encountered while using string literals containing special characters
(squared, cubed, and degree symbols) in source files encoded as
latin-1. I specify :external-format :latin-1 to compile-file, but the
resulting objects/fasls error at load-time during their
initialization, the error indicating an attempt to decode latin-1
characters as UTF-8. A peek at the .data file confirmed the characters
were not UTF-8 encoded in the compiler output.

I've attached a patch which fixes this by changing DATA-C-DUMP,
supplying :external-format :utf8 to WT-FILTERED-DATA when writing the
data string if unicode is enabled. I also changed the surrounding
WITH-OPEN-FILE to supply :external-format :latin-1, as a pun for
passthrough encoding. I'm sure this isn't necessary, but without the
assurances of using a binary stream (or knowledge of how
:external-format :default is interpreted throughout ECL), I wanted to
pin it down to a specific behavior.

I found another bug along the way, in the use of
sequence-output-streams in C::UTF8-ENCODED-STRING. If the output
vector has to be resized, the original vector is returned instead of
the newer, larger vector. I fixed it by making the vector adjustable,
so adjust-array calls replace-array and we get what we want. An
alternative would be using an interface like string output streams,
with a function to retrieve the accumulated result, but this way was a
one line change.

Here's an example of how it returns the wrong result if the initial
size estimate (* 1.2 length) is too short:

> (length (c::utf8-encoded-string (string (code-char 179))))

0    ; should be 2.
> (length (c::utf8-encoded-string (concatenate 'string (string (code-char 179)) 
> ".")))

2   ; should be 3.
> (length (c::utf8-encoded-string (concatenate 'string (string (code-char 179)) 
> "..")))

4   ; correct.
diff --git a/src/cmp/cmpc-wt.lsp b/src/cmp/cmpc-wt.lsp
index 47e0bb7..0670acf 100644
--- a/src/cmp/cmpc-wt.lsp
+++ b/src/cmp/cmpc-wt.lsp
@@ -127,6 +127,7 @@
 (defun utf8-encoded-string (string)
   (let* ((output (make-array (round (* 1.2 (length string)))
 			     :element-type 'base-char
+                             :adjustable t
 			     :fill-pointer 0))
 	 (stream (make-sequence-output-stream output :external-format :utf-8)))
     (write-string string stream)
diff --git a/src/cmp/cmpwt.lsp b/src/cmp/cmpwt.lsp
index cc6ac37..6fa6186 100644
--- a/src/cmp/cmpwt.lsp
+++ b/src/cmp/cmpwt.lsp
@@ -84,13 +84,17 @@
 
 (defun data-c-dump (filename)
   (with-open-file (stream filename :direction :output :if-does-not-exist :create
-                          :if-exists :supersede :external-format :default)
+                          :if-exists :supersede
+                          :external-format :latin-1)
     (let ((string (data-dump-array)))
       (if (and *compile-in-constants* (plusp (length string)))
 	  (let ((*wt-string-size* 0)
 		(*wt-data-column* 80))
 	    (princ "static const char compiler_data_text[] = " stream)
-	    (wt-filtered-data string stream)
+	    (wt-filtered-data string stream
+                              :external-format
+                              #+unicode :utf-8
+                              #-unicode :default)
 	    (princ #\; stream)
 	    (format stream "~%#define compiler_data_text_size ~D~%"
 		    *wt-string-size*))
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list

Reply via email to