Raphael Geissert <[EMAIL PROTECTED]> writes: > Yeah, that's another known issue. DEHS currently treats all its > input data as ISO-8859-1 and converts it into UTF-8 without first > checking if the input is already UTF-8.
Since there's no way to deduce the encoding of a byte stream for certain, and the heuristics are both complicated and prone to false positives, it would be better not to "check" the encoding of a text file in the absence of an explicit declaration of its encoding. Better would be to assume the input is UTF-8, and encourage all authors of such files to encode them using Debian's de facto (and, in increasingly many areas, de jure) standard of UTF-8. > Will try to fix it next time I dig on DEHS' code. Appreciated. -- \ “Some people, when confronted with a problem, think 'I know, | `\ I'll use regular expressions'. Now they have two problems.” | _o__) —Jamie Zawinski, in alt.religion.emacs | Ben Finney -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

