Package: html2text Version: 1.3.2a-14 Severity: normal Hi,
trying to create the Spanish documentation of aptitude (from it's repository, revision 3228:c354bd7ae8c7) using $ make -C debug/doc/es which calls rm -fr output-txt xsltproc -o output-txt/index.html ../../../doc/es/../aptitude-txt.xsl aptitude.xml Error: no ID for constraint linkend: configAptInstallRecommends. html2text -width 80 -ascii -nobs -rcfile ../../../doc/es/../aptitude-txt.style output-txt/index.html | ../../../doc/es/../fixup-text > README.es results in a bogus text file README.es: First of all a lot of UTF-8 characters are used (in an UTF-8 environment): Examples from the first lines of the file: <quote> Versi????n 0.5.3.1 Copyright ???? 2004-2008 Daniel Burrows </quote> Removing the option -ascii (which doesn't work as expected) one still doesn't get a proper UTF-8 file: aptitude/debug/doc/es$ html2text -width 80 -rcfile ../../../doc/es/../aptitude-txt.style output-txt/index.html | \ grep "la mitad inferior de la pantalla" | |??rea de informaci??n (la mitad inferior de la pantalla). El ??rea de informaci???|n As you can see the problem is the vertical column separator | which probably interrupts two bytes of the last multibyte character and makes the file not UTF-8 conform. I assumed it should be easy to reproduce but failed with another error: $ html2text -width 10 test.html Input recoding failed due to invalid input sequence. Unconverted part of text follows. #|?? |?????? ??#|?? |?????? ??#|?? |?????? ??#|?? |?????? ??#|?? |?????? ??#|?? |??????____| This error is wrong. test.html is a proper HTML file in latin1 encoding! So many errors ... Jens -- System Information: Debian Release: squeeze/sid APT prefers testing APT policy: (900, 'testing'), (800, 'unstable'), (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.26 (SMP w/1 CPU core) Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages html2text depends on: ii libc6 2.9-25 GNU C Library: Shared libraries ii libgcc1 1:4.4.1-1 GCC support library ii libstdc++6 4.4.1-1 The GNU Standard C++ Library v3 html2text recommends no packages. Versions of packages html2text suggests: ii curl 7.19.5-1 Get a file from an HTTP, HTTPS or ii wget 1.11.4-4 retrieves files from the web -- no debconf information
| öäü öäü öäü öäü öäü öäü öäü öäü öäü öäü öäü öäü öäü |
|---|

