Package: poppler-utils Version: 0.26.5-2 Severity: normal Dear Maintainer,
Running pdftotext on the attached PDF produces some spurious form feeds.
I'm using pdftotext to extract text for indexing an 800-page document,
and the spurious form feeds throw off the page counts.
I've worked around the problem by altering libpoppler46 to grab the
page-termination sequence from an environment variable, but that's
just a hack.
- A good fix would be to find out why these vertical bars are
rendering as form feeds and to render them as suitable ASCII or
Unicode characters.
- An acceptable workaround would be to replace the -nopgbrk option
with an option that sets the "page break" string to a string given
on the command line (which could be empty). That second solution
would appear to involve some nontrivial refactoring of the code
base, which is why I haven't tried to make a patch for it.
-- System Information:
Debian Release: 8.1
APT prefers stable
APT policy: (990, 'stable')
Architecture: i386 (x86_64)
Foreign Architectures: amd64
Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages poppler-utils depends on:
ii libc6 2.19-18
ii libcairo2 1.14.0-2.1
ii libfreetype6 2.5.2-3
ii libgcc1 1:4.9.2-10
ii liblcms2-2 2.6-3+b3
ii libpoppler46 0.26.5-2.1
ii libstdc++6 4.9.2-10
ii zlib1g 1:1.2.8.dfsg-2+b1
poppler-utils recommends no packages.
poppler-utils suggests no packages.
-- no debconf information
page23.pdf
Description: Adobe PDF document

