In article <[EMAIL PROTECTED]>, David Kastrup <[EMAIL PROTECTED]> writes: > I have the problem that within preview-latex there is a function that > assembles UTF-8 strings from single characters. This function, when > used manually, mostly works. It is called within a process sentinel > and fails rather consistently there with a current CVS Emacs. I > include the code here since I don't know what might be involved here: > regexp-quote, substring, char-to-string etc. The starting string is > taken from a buffer containing only ASCII (inserted by a process with > coding-system 'raw-text).
It seems that you are caught in a trap of automatic unibyte->multibyte conversion. > (defun preview-error-quote (string) > "Turn STRING with potential ^^ sequences into a regexp. > To preserve sanity, additional ^ prefixes are matched literally, > so the character represented by ^^^ preceding extended characters > will not get matched, usually." > (let (output case-fold-search) > (while (string-match "\\^\\{2,\\}\\(\\([EMAIL > PROTECTED])\\|[8-9a-f][0-9a-f]\\)" > string) > (setq output > (concat output > (regexp-quote (substring string > 0 > (- (match-beginning 1) 2))) If STRING is taken from a multibyte buffer, it is a multibyte string. Thus, the above substring also returns a multibyte string. > (if (match-beginning 2) > (concat > "\\(?:" (regexp-quote > (substring string > (- (match-beginning 1) 2) > (match-end 0))) > "\\|" > (char-to-string > (logxor (aref string (match-beginning 2)) 64)) > "\\)") > (char-to-string > (string-to-number (match-string 1 string) 16)))) But, this char-to-string produces a unibyte string. So, on concatinating them, this unibyte string is automatically converted to multibyte by string-make-multibyte function which usually produces a multibyte string containing latin-1 chars. > string (substring string (match-end 0)))) > (setq output (concat output (regexp-quote string))) > (if (featurep 'mule) > (prog2 > (message "%S %S " output buffer-file-coding-system) > (setq output (decode-coding-string output > buffer-file-coding-system)) And this decode-coding-string treats the internal byte sequence of a multibyte string OUTPUT as utf-8, thus you get some garbage. > Unfortunately, when I call this stuff by hand instead from the > process-sentinel, it mostly works That is because the string you give to preview-error-quote is a unibyte string in that case. The Lisp reader generates a unibyte string when it sees ASCII-only string. Ex: (multibyte-string-p "abc") => nil This will also return incorrect string. (preview-error-quote (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$")) So, the easiest fix will be to do: (setq string (string-as-unibyte string)) in the head of preview-error-quote. --- Ken'ichi HANDA [EMAIL PROTECTED] _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel