[email protected] dixit:

>       It shows up rarely.  I can't make sense why.  There are

I can: it’s often produced by Microsoft users who save
in a legacy codepage encoding, then convert from latin1
to Unicode.

Now the codepage 1252 is a superset of latin1. latin1
leaves 0x80‥0x9F for C1 control characters (and latin1
is exactly the first 256 codepoints of Unicode), while
cp1252 assigns stuff like € and “” inside that block.

So, basically, a mild cause of Mojibake. But since C1
control characters have no business of existing inside
an HTML document, I’d parse this to dissolve that, i.e.
as misconverted cp1252, instead.

bye,
//mirabilos
-- 
> emacs als auch vi zum Kotzen finde (joe rules) und pine für den einzig
> bedienbaren textmode-mailclient halte (und ich hab sie alle ausprobiert). ;)
Hallooooo, ich bin der Holger ("Hallo Holger!"), und ich bin ebenfalls
... pine-User, und das auch noch gewohnheitsmäßig ("Oooooooohhh").  [aus dasr]

_______________________________________________
Lynx-dev mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lynx-dev

Reply via email to