Package: poppler-utils
Version: 0.12.4-1.2
Severity: normal
I like to idea behind pdftotext, and have been
using it a lot.
Unfortunately, it seems to me that I recently
discovered it corrupting data.
It changed the minus sign, "-", to "2" in tables
in a scientific paper.
Maybe we agree that corrupting data in scientific
PDFs is a serious problem.
Here's how I noticed it:
1.) Download a copy of the PDF file at
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2825258/pdf/pone.0009339.pdf
2.) At the shell prompt, type
$ pdftotext -layout /tmp/2010-03-Cinnamon\ increases\ life\ span.pdf - |
less
3.) Scroll down in less to Table 3.
4.) Look for the line that begins with
"Atractylodes japonica".
5.) See that the column titled "% change" is
"21.1".
6.) Look at the same number in the original PDF
file. It should be "-1.1"! The "-" was silently
corrupted to "2".
Other numbers in the same column that should
begin with "-" were also corrupted to "2".
Thanks,
Kingsley
-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (990, 'unstable'), (500, 'lenny'), (1, 'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.6.32-5-686 (SMP w/2 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash
Versions of packages poppler-utils depends on:
ii libc6 2.11.2-7 Embedded GNU C Library: Shared lib
ii libfontconfig1 2.8.0-2.1 generic font configuration library
ii libgcc1 1:4.4.5-10 GCC support library
ii libpoppler5 0.12.4-1.1 PDF rendering library
ii libstdc++6 4.4.5-10 The GNU Standard C++ Library v3
ii libxml2 2.7.6.dfsg-1 GNOME XML library
Versions of packages poppler-utils recommends:
ii ghostscript 8.71~dfsg2-6 The GPL Ghostscript PostScript/PDF
poppler-utils suggests no packages.
-- no debconf information
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]