On 31 December 2015 at 12:58, Gavin Smith <[email protected]> wrote: > On 31 December 2015 at 11:17, Werner LEMBERG <[email protected]> wrote: >> >>> I think the "Undefined control sequence" message comes for an active >>> character (that is, a byte with category code 13) which doesn't have a >>> definition. This would be hex c2. >> >> Can you really make byte 0xC2 active in luatex? This would completely >> break Unicode support, since 0xC2 is an incomplete UTF-8 sequence. > > If you can't, it's big trouble for Texinfo's Unicode support for > luatex. Do you know of any way to go back to byte-wise input instead > of input by UTF-8 characters? > > Also I'm not sure in this case if it's 0xC2 that's active or the whole > character 0xC2 0xA7. Do you know if each UTF-8 character has its own > catcode?
So here's what's happening. When @documentencoding UTF-8 is given, it makes bytes 128-255 active. However, for LuaTeX, it's actually making characters 128-255 active. The section and paragraph marks have a character number less than 256 in Unicode, so those characters are made active. The obelisk and diesis marks have character numbers more than 256, so they aren't made active and are set as normal characters. I don't have a solution to offer to this problem, any suggestions?
