Arthur Reutenauer a écrit : >> 0xC2 0x80, but its utf8 code is 0xE2 0x82 0xAC > > Yes, it simply means that you fool LuaTeX, which reads the file as if > it was UTF-8, into thinking that it saw character U+0080, so that it > prints character 0x80 to the output file, which turns out to be exactly > what you want if you use the appropriate font. That's Unicode-heretic, > of course, but a natural trick if you're familiar with the hacks people > commonly used with 8-bit custom fonts in the pre-Unicode days, and I'm > happy to learn that it actually works. PDF readers might even correctly > interpret the text if they have valid Type 1 fonts (thanks to the glyph > names) -- so that copypasting work, for example. > > It has nothing to do with the number of bytes a character needs in its > UTF-8 form, LuaTeX simply reads the appropriate number of them and > converts the byte stream to a list of Unicode characters upon opening > the file. > I agree, in some sense it is the easiest way to make inputenc "work" with LuaTeX exactly the same way as it works with an 8-bit TeX.
But it means you also keep all the problems of the current inputenc over 8-bit TeX. I really wonder whether it's useful to be keep that level of compatibility: it would just mean no benefit on the encoding side for the unaware user switching to the luatex engine without switching to "real" utf-8 without inputenc... Manuel.
