Package: lynx
Version: 2.8.8pre3-1
Severity: normal
If I run "lynx -dump" on this HTML:
<html>
<body>
This ( ) is a UTF-8 unbreakable space.
</body>
</html>
I get this output:
This (Â ) is a UTF-8 unbreakable space.
Note the "capital A with circumflex". This seems to be because the C2
A0 sequence is being interpreted as two iso-8859-1 characters, rather
than a single utf-8 character.
If I add the "-assume_charset=utf8" option, it does what I expect, but
I believe that should be the default (especially since I have
LANG=en.utf8 as my locale).
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]