>BTW, WRT spotting multi-byte UTF-8 encoding, I don't think that's a >goer. Valid UTF-8 and valid GB2312 can share the same sequences, >especially if it's just the odd `£' or `拢` in ASCII text.
It was just a suggestion, not one I was particularly crazy about ... but not all arbitrary 8-bit sequences are valid UTF-8. And it looks like for GB2312 (using the EUC-CN encoding, right?) it would be harder, but there are certainly invalid sequences for GB2312. Although I do not think this is a business we should be in; pick your locale properly or explicitly specify a character set in the draft. --Ken _______________________________________________ Nmh-workers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/nmh-workers
