Package: poppler-utils
Version: 0.26.5-2
Severity: normal

Dear Maintainer,

Running pdftotext on the attached PDF produces some spurious form feeds.
I'm using pdftotext to extract text for indexing an 800-page document,
and the spurious form feeds throw off the page counts.

I've worked around the problem by altering libpoppler46 to grab the
page-termination sequence from an environment variable, but that's
just a hack.

  - A good fix would be to find out why these vertical bars are
    rendering as form feeds and to render them as suitable ASCII or
    Unicode characters.

  - An acceptable workaround would be to replace the -nopgbrk option
    with an option that sets the "page break" string to a string given
    on the command line (which could be empty).  That second solution
    would appear to involve some nontrivial refactoring of the code
    base, which is why I haven't tried to make a patch for it.



-- System Information:
Debian Release: 8.1
  APT prefers stable
  APT policy: (990, 'stable')
Architecture: i386 (x86_64)
Foreign Architectures: amd64

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages poppler-utils depends on:
ii  libc6         2.19-18
ii  libcairo2     1.14.0-2.1
ii  libfreetype6  2.5.2-3
ii  libgcc1       1:4.9.2-10
ii  liblcms2-2    2.6-3+b3
ii  libpoppler46  0.26.5-2.1
ii  libstdc++6    4.9.2-10
ii  zlib1g        1:1.2.8.dfsg-2+b1

poppler-utils recommends no packages.

poppler-utils suggests no packages.

-- no debconf information

Attachment: page23.pdf
Description: Adobe PDF document

Reply via email to