Package: poppler-utils
Version: 0.4.5-5.1
Severity: normal

Sometimes the command
  pdftohtml -enc UTF-8 -i -hidden -xml file.pdf
produces invalid xml, with crossed tags, like this:
  <text top="264" left="120" width="106" height="12" font="2"><b>naň skoro ako 
na<i> zázrak.</b></i></text>

This happens often with PDFs from ABBY Finereader, I put an example to:
http://kassiopeia.juls.savba.sk/~garabik/junk/pdftohtml/



-- System Information:
Debian Release: 3.1
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: powerpc (ppc)
Shell:  /bin/sh linked to /bin/dash
Kernel: Linux 2.6.15-1-powerpc
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages poppler-utils depends on:
ii  libc6                        2.3.6.ds1-8 GNU C Library: Shared libraries
ii  libgcc1                      1:4.1.1-13  GCC support library
ii  libpoppler0c2                0.4.3-3     PDF rendering library
ii  libstdc++6                   4.1.1-13    The GNU Standard C++ Library v3

poppler-utils recommends no packages.

-- no debconf information

Reply via email to