Package: recode
Version: 3.6-20
Severity: normal

When converting text from HTML to UTF-8, existing and valid UTF-8 
characters get mangled. Example shell session:

$ cat sample.htm
<!doctype html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head><body><p>Entities: &Auml;&Uuml;&Ouml;; UTF-8: ÄÜÖ</p></body></html>
$ file sample.htm
sample.htm: HTML document, UTF-8 Unicode text
$ recode html..utf8 < sample.htm
<!doctype html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head><body><p>Entities: ÄÜÖ; UTF-8: Ã�Ã�Ã�</p></body></html>

The same works fine in case of ISO-8859-15 (and probably similar 
charsets but I have tried only l9).


-- System Information:
Debian Release: 7.5
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages recode depends on:
ii  dpkg          1.16.14
ii  install-info  4.13a.dfsg.1-10
ii  libc6         2.18-5
ii  librecode0    3.6-20

recode recommends no packages.

recode suggests no packages.

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to