Package: poppler-utils Version: 0.4.5-5.1 Severity: normal Sometimes the command pdftohtml -enc UTF-8 -i -hidden -xml file.pdf produces invalid xml, with crossed tags, like this: <text top="264" left="120" width="106" height="12" font="2"><b>naň skoro ako na<i> zázrak.</b></i></text>
This happens often with PDFs from ABBY Finereader, I put an example to: http://kassiopeia.juls.savba.sk/~garabik/junk/pdftohtml/ -- System Information: Debian Release: 3.1 APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing') Architecture: powerpc (ppc) Shell: /bin/sh linked to /bin/dash Kernel: Linux 2.6.15-1-powerpc Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Versions of packages poppler-utils depends on: ii libc6 2.3.6.ds1-8 GNU C Library: Shared libraries ii libgcc1 1:4.1.1-13 GCC support library ii libpoppler0c2 0.4.3-3 PDF rendering library ii libstdc++6 4.1.1-13 The GNU Standard C++ Library v3 poppler-utils recommends no packages. -- no debconf information

