Package: recode Version: 3.6-20 Severity: normal When converting text from HTML to UTF-8, existing and valid UTF-8 characters get mangled. Example shell session:
$ cat sample.htm <!doctype html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head><body><p>Entities: ÄÜÖ; UTF-8: ÄÜÖ</p></body></html> $ file sample.htm sample.htm: HTML document, UTF-8 Unicode text $ recode html..utf8 < sample.htm <!doctype html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head><body><p>Entities: ÄÜÖ; UTF-8: Ã�Ã�Ã�</p></body></html> The same works fine in case of ISO-8859-15 (and probably similar charsets but I have tried only l9). -- System Information: Debian Release: 7.5 APT prefers stable APT policy: (990, 'stable'), (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-4-amd64 (SMP w/8 CPU cores) Locale: LANG=en_US, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages recode depends on: ii dpkg 1.16.14 ii install-info 4.13a.dfsg.1-10 ii libc6 2.18-5 ii librecode0 3.6-20 recode recommends no packages. recode suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org