Kenichi Handa <[EMAIL PROTECTED]> writes: > In article <[EMAIL PROTECTED]>, David Kastrup <[EMAIL PROTECTED]> writes: >> I have the problem that within preview-latex there is a function >> that assembles UTF-8 strings from single characters. This >> function, when used manually, mostly works. > > It seems that you are caught in a trap of automatic > unibyte->multibyte conversion. > >> (defun preview-error-quote (string) >> "Turn STRING with potential ^^ sequences into a regexp. >> To preserve sanity, additional ^ prefixes are matched literally, >> so the character represented by ^^^ preceding extended characters >> will not get matched, usually." >> (let (output case-fold-search) >> (while (string-match "\\^\\{2,\\}\\(\\([EMAIL >> PROTECTED])\\|[8-9a-f][0-9a-f]\\)" >> string) >> (setq output >> (concat output >> (regexp-quote (substring string >> 0 >> (- (match-beginning 1) 2))) > > If STRING is taken from a multibyte buffer, it is a > multibyte string. Thus, the above substring also returns a > multibyte string. > >> (char-to-string >> (string-to-number (match-string 1 string) 16)))) > > But, this char-to-string produces a unibyte string. So, on > concatinating them, this unibyte string is automatically converted > to multibyte by string-make-multibyte function which usually > produces a multibyte string containing latin-1 chars.
Oh. Latin-1 chars. Can't I tell char-to-string to produce the same sort of raw-marked chars that raw-text (as process-coding system) appears to produce? >> (setq output (decode-coding-string output buffer-file-coding-system)) > > And this decode-coding-string treats the internal byte > sequence of a multibyte string OUTPUT as utf-8, thus you get > some garbage. > >> Unfortunately, when I call this stuff by hand instead from the >> process-sentinel, it mostly works > > That is because the string you give to preview-error-quote > is a unibyte string in that case. The Lisp reader generates > a unibyte string when it sees ASCII-only string. > > Ex: (multibyte-string-p "abc") => nil > > This will also return incorrect string. > > (preview-error-quote > (string-to-multibyte "r Weise $f$ um~$1$ erh^^c3^^b6ht und $e$")) > > So, the easiest fix will be to do: > (setq string (string-as-unibyte string)) > in the head of preview-error-quote. Sigh. XEmacs-21.4-mule does not seem to have string-as-unibyte. I'll have to see whether it happens to work without it on XEmacs. If not, I'll have to come up with something else. Thanks for the analysis! -- David Kastrup, Kriemhildstr. 15, 44793 Bochum _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel