Your message dated Fri, 03 Oct 2008 12:47:03 +0000
with message-id <[EMAIL PROTECTED]>
and subject line Bug#496226: fixed in html2text 1.3.2a-10
has caused the Debian Bug report #496226,
regarding html2text: should recognize 'meta' html tag and make input recoding
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [EMAIL PROTECTED]
immediately.)
--
496226: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496226
Debian Bug Tracking System
Contact [EMAIL PROTECTED] with problems
--- Begin Message ---
Package: html2text
Version: 1.3.2a-6
Severity: minor
Hello,
As the information below says, I'm not using a UTF-8 locale. html2text
will however, on utf-8 html pages, produce UTF-8 text. Conversely, on a
UTF-8 system, html2text will, on latin1 html pages, produce latin1 text.
The recently added -utf8 option handles the UTF-8 on UTF-8 case, but
not the two cases above.
Generally speaking, there is no reason why the input and output charsets
should be related at all. For the input, html2text should recognize
the meta http-equiv tag; that should work for a lot of pages, else an
input-charset option can be provided. For the output, the current
locale's charset should be used (as returned by nl_langinfo(CODESET)
after calling setlocale(LC_CTYPE,"")); that should work in almost all
cases, else an output-charset option can be provided.
Yes, that means conversions. But that's the way charsets are supposed
to be handled. Note btw that for the conversions, one can just use
iconv_open(nl_langinfo(CODESET), page_charset), but can can also append
"//translit" to nl_langinfo(CODESET), so that iconv makes the
transliterations itself, i.e. turn curly quotes and long dashes into
equivalents in the target charset.
Samuel
-- System Information:
Debian Release: lenny/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable'), (1,
'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.6.26
Locale: [EMAIL PROTECTED], [EMAIL PROTECTED] (charmap=ISO-8859-15)
Shell: /bin/sh linked to /bin/bash
Versions of packages html2text depends on:
ii libc6 2.7-13 GNU C Library: Shared libraries
ii libgcc1 1:4.3.1-2 GCC support library
ii libstdc++6 4.3.1-2 The GNU Standard C++ Library v3
html2text recommends no packages.
Versions of packages html2text suggests:
ii curl 7.18.2-5 Get a file from an HTTP, HTTPS or
ii wget 1.11.4-1 retrieves files from the web
-- no debconf information
--- End Message ---
--- Begin Message ---
Source: html2text
Source-Version: 1.3.2a-10
We believe that the bug you reported is fixed in the latest version of
html2text, which is due to be installed in the Debian FTP archive:
html2text_1.3.2a-10.diff.gz
to pool/main/h/html2text/html2text_1.3.2a-10.diff.gz
html2text_1.3.2a-10.dsc
to pool/main/h/html2text/html2text_1.3.2a-10.dsc
html2text_1.3.2a-10_i386.deb
to pool/main/h/html2text/html2text_1.3.2a-10_i386.deb
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to [EMAIL PROTECTED],
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Eugene V. Lyubimkin <[EMAIL PROTECTED]> (supplier of updated html2text package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing [EMAIL PROTECTED])
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Format: 1.8
Date: Sat, 20 Sep 2008 14:10:09 +0300
Source: html2text
Binary: html2text
Architecture: source i386
Version: 1.3.2a-10
Distribution: experimental
Urgency: low
Maintainer: Eugene V. Lyubimkin <[EMAIL PROTECTED]>
Changed-By: Eugene V. Lyubimkin <[EMAIL PROTECTED]>
Description:
html2text - advanced HTML to text converter
Closes: 285378 307425 496226 498797
Changes:
html2text (1.3.2a-10) experimental; urgency=low
.
* debian/rules:
- Really install NEWS.Debian as documentation.
* debian/patches:
- 220-nobs-when-stdout-is-a-tty.patch: deleted, useless now, since
backspaces are not produced at all.
- 400-remove-builtin-http-support.patch: refreshed.
- 500-utf8-support.patch: refreshed.
- 510-utf8-implies-nobs.patch: deleted, useless now.
- New 510-disable-backspaces.patch: disable backspaces because parser
cannot produce them rightly in multi-byte sequences now.
- 611-recognize-input-encoding.patch:
- Corrected to don't produce error if '-nometa' option was not supplied
and input html doesn't contain valid 'meta http-equiv' tag.
- Corrected to don't display debug info twicely (if -debug-parser or
-debug-scanner was supplied).
- Corrected: now parser always processes UTF-8 text, needed for proper
output recoding.
- Moved recoding code to separate function.
- Close input stream directly after read, not after the processing.
- Correctly mark the end of converted sequence.
- New 630-recode-output-to-locale-charset.patch: convert output to current
locale charset. (Closes: #498797)
- 300-replace-zeroes-with-null.patch: renamed to
800-replace-zeroes-with-null.patch.
- New 810-fix-deprecated-conversion-warnings.patch: fix 'deprecated
conversion from string constant to ‘char*’' warnings during build by
supplying 'const' qualifier in needed places.
* debian/README.Debian:
- Renamed 'META HTTP-EQUIV' section to 'Input recoding'.
- Added correct input encoding cases to 'Input recoding' section.
- Added 'Backspaces' section.
- Added 'Output recoding' section.
* debian/html2text.1:
- Mentioned that Debian version of html2text doesn't produce backspaces,
so '-nobs' does nothing.
- Added paragraph about input/output recoding.
* debian/NEWS.Debian:
- Corrected news for 1.3.2a-9.
* debian/control:
- Renewed long description.
[unera]
* debian/changelog:
- fixed incorrect changelog record 1.3.2a-9 (Thanks for Stanislav
Maslovski <[EMAIL PROTECTED]> for the
600-multiple-meta-tags.patch :)).
.
html2text (1.3.2a-9) experimental; urgency=low
.
The "grepping binary device for patch parts" release.
* debian/patches:
- Refreshed all patches.
- Add comments to all patches.
- New patch 400-remove-builtin-http-support.patch: remove limited built-in
http support. "Wontfix" bugs related to http support are closed thus.
(Closes: #307425, #285378)
- New patch 600-multiple-meta-tags.patch: recognize all 'meta' tags, not
one. Thanks to Stanislav Maslovski <[EMAIL PROTECTED]> for
the patch, thanks to Dmitry E. Oboukhov for the idea of patch.
- New patch 611-recognize-input-encoding.patch: recode input according to
'meta' tag. Thanks to Dmirty E. Oboukhov for the idea of patch.
(Closes: #496226)
* debian/html2text.1:
- Mentioned new '-nometa' option.
- Updated descriptions of '-utf8' and '-ascii' options.
- Mentioned that Debian version of html2text has no http support.
- Updated author's mail and download page.
* debian/README.Debian:
- Updated HTTP section, wrote META HTTP-EQUIV section.
Checksums-Sha1:
b2f86e2c6de48dbb33fd8ef1c4bc57e7ad3db209 1033 html2text_1.3.2a-10.dsc
6b916eee26412677e814d6240b82354c7d265889 27387 html2text_1.3.2a-10.diff.gz
ba895bdab623c68842ae74c575290f3e14b868f5 97532 html2text_1.3.2a-10_i386.deb
Checksums-Sha256:
9de781984b64445d96686ac95bc2e8dbae1e5745aef8714b428c9034f8d65e88 1033
html2text_1.3.2a-10.dsc
dee337dbafa0b79eff59b215e5727696d444bd524c8f990b8417be818e4296e7 27387
html2text_1.3.2a-10.diff.gz
314c924bc21be146af89f6764243310d1b25f5b6e99f587c6363a3e9feac6891 97532
html2text_1.3.2a-10_i386.deb
Files:
338106f0781fa56e59a8b2b0d054326c 1033 web optional html2text_1.3.2a-10.dsc
1f93477ccdee23a16733dc0b611b8553 27387 web optional html2text_1.3.2a-10.diff.gz
b06963d527ecbc741aade80115b1a3f9 97532 web optional
html2text_1.3.2a-10_i386.deb
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFI5g+Qq4wAz/jiZTcRAn99AKDJ824+P0f7AMnG70zZa5zWk9OUbQCgt9Pn
So82b7Io7lR+JK47K9Ahj6o=
=R0Tp
-----END PGP SIGNATURE-----
--- End Message ---