Stefan Monnier <[EMAIL PROTECTED]> writes: >> (while (string-match "\\^\\{2,\\}\\(\\([EMAIL >> PROTECTED])\\|[8-9a-f][0-9a-f]\\)" >> string) >> (setq output >> (concat output >> (regexp-quote (substring string >> 0 >> (- (match-beginning 1) 2))) >> (if (match-beginning 2) >> (concat >> "\\(?:" (regexp-quote >> (substring string >> (- (match-beginning 1) 2) >> (match-end 0))) >> "\\|" >> (char-to-string >> (logxor (aref string (match-beginning 2)) 64)) >> "\\)") >> (char-to-string >> (string-to-number (match-string 1 string) 16)))) >> string (substring string (match-end 0)))) >> (setq output (concat output (regexp-quote string))) >> (if (featurep 'mule) >> (prog2 >> (message "%S %S " output buffer-file-coding-system) >> (setq output (decode-coding-string output >> buffer-file-coding-system)) >> (message "%S\n" output)) >> output))) > > The problem is that by passing `output' to decode-coding-string you > clearly consider `output' to be a sequence of bytes. But to > construct `output' you use pieces of `string' so you have to make > sure that `string' is also a sequence of bytes. Assuming `string' > comes from the TeX process, you can do that by making sure that that > process's output coding system is `binary' (or `raw-text' if you > want EOL-conversion).
I already mentioned that this _is_ exactly what we do already: the problem is that some TeX systems are set up to quote _some_ bytes from utf-8 in the ^^xx hexadecimal notation, and let some bytes through unchanged. It is completely braindead. The funny thing is that with the _mixed_ representation, the hard case, this code worked. But with the _complete_ ASCII transcription, it doesn't. I have to experiment a bit with things like string-as-multibyte and stuff to find out what combination will be right all of the time. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel