On 2007–03–22, at 01:07, Shelly Spearing wrote:
Have you looked at xpdf?
See
http://www.foolabs.com/xpdf/
--Shelly
On Mar 21, 2007, at 3:25 PM, Vic Norton wrote:
What is a good package for extracting text from a PDF file?
There's also the Text Extraction Toolkit from PDFlib: <http://
www.pdflib.com/products/tet/>. It comes with Perl bindings. This is
commercial software, and will cost you $199 (or more for a server).
While I've used PDFlib in the past, and it was a good solid product,
I can't speak for TET.
As an alternative approach, how about a bit of coarse
AppleScripting?: if you open the document in Safari, Select All, then
Copy, your clipboard will contain something that pastes as plain text
into an app that won't accept anything more sophisticated. (Apps that
accept rich text will get it in that format.) For some reason,
Preview is not scriptable (shame, Apple, shame), and nor is Adobe
Reader.) You can even combine AppleScript and Perl with, for example,
Mac::AppleScript or Mac::Glue.
For example I get monthly PDF reports from E*Trade, an online
broker. It would be very convenient if I could access this data
from Perl.
Of course. However, IANL, but you might want to check that this is OK
under the terms of use.
--
Dominic Dunlop