On 2007–03–22, at 01:07, Shelly Spearing wrote:

Have you looked at xpdf?
See
http://www.foolabs.com/xpdf/
--Shelly

On Mar 21, 2007, at 3:25 PM, Vic Norton wrote:

What is a good package for extracting text from a PDF file?

There's also the Text Extraction Toolkit from PDFlib: <http:// www.pdflib.com/products/tet/>. It comes with Perl bindings. This is commercial software, and will cost you $199 (or more for a server). While I've used PDFlib in the past, and it was a good solid product, I can't speak for TET.

As an alternative approach, how about a bit of coarse AppleScripting?: if you open the document in Safari, Select All, then Copy, your clipboard will contain something that pastes as plain text into an app that won't accept anything more sophisticated. (Apps that accept rich text will get it in that format.) For some reason, Preview is not scriptable (shame, Apple, shame), and nor is Adobe Reader.) You can even combine AppleScript and Perl with, for example, Mac::AppleScript or Mac::Glue.

For example I get monthly PDF reports from E*Trade, an online broker. It would be very convenient if I could access this data from Perl.

Of course. However, IANL, but you might want to check that this is OK under the terms of use.
--
Dominic Dunlop


Reply via email to