Re: pdf2text ?

Dominic Dunlop Thu, 22 Mar 2007 02:29:38 -0800

On 2007–03–22, at 01:07, Shelly Spearing wrote:

Have you looked at xpdf?
See
http://www.foolabs.com/xpdf/
--Shelly


On Mar 21, 2007, at 3:25 PM, Vic Norton wrote:

What is a good package for extracting text from a PDF file?

There's also the Text Extraction Toolkit from PDFlib: <http://www.pdflib.com/products/tet/>. It comes with Perl bindings. This iscommercial software, and will cost you $199 (or more for a server).While I've used PDFlib in the past, and it was a good solid product,I can't speak for TET.

As an alternative approach, how about a bit of coarseAppleScripting?: if you open the document in Safari, Select All, thenCopy, your clipboard will contain something that pastes as plain textinto an app that won't accept anything more sophisticated. (Apps thataccept rich text will get it in that format.) For some reason,Preview is not scriptable (shame, Apple, shame), and nor is AdobeReader.) You can even combine AppleScript and Perl with, for example,Mac::AppleScript or Mac::Glue.

For example I get monthly PDF reports from E*Trade, an onlinebroker. It would be very convenient if I could access this datafrom Perl.

Of course. However, IANL, but you might want to check that this is OKunder the terms of use.

--
Dominic Dunlop

Re: pdf2text ?

Reply via email to