Re: pdf2text ?

Vic Norton Fri, 23 Mar 2007 13:47:03 -0800

As I mentioned John Delacour's GUI scripting solution will work fine for me. 
However, another approach might work on more complicated PDF documents.

At Shelly Spearing's suggestion I "looked at xpdf". In fact I installed "xpdf" 
and "pdftohtml" via darwinports. Now the command
   pdftohtml somefile.pdf
will produce one or more html documents that I can be accessed with Perl. It 
might be easier to find things in this html code than in text copied from the 
original PDF file.

This is just a thought; I haven't tried it.

Regards,

Vic

On 3/23/07, at 3:52 PM -0700, Avi Rappoport wrote:
> >If all that's needed is to copy the whole text of a pdf window and 
> >put it in a text file, then GUI scripting can be used.
> >
> This will work for some PDF files, but not all.  Some have no text at 
> all (scanned only), and others the text has been generated badly, so 
> a two-column page will have text from column 1, line 1; then column 
> 2, line 1; then column 1, line 2...
> 
> Sigh.
> 
> Avi

On 3/21/07, at 8:07 PM -0600, Shelly Spearing wrote:
> Have you looked at xpdf?
> See
> http://www.foolabs.com/xpdf/
> --Shelly

Re: pdf2text ?

Reply via email to