Re: [909linux] parsing PDF with Perl

Roger E. Rustad, Jr. Wed, 11 Oct 2006 16:52:11 -0700 (PDT)

Interesting - running "pdftotext.ext -raw file.pdf file.txt" dumps it into a
format similar to what you'd get if you selected all and then cp/pasetd.


On 10/11/06, Roger E. Rustad, Jr. <[email protected]> wrote:


Just ran pdftotext, and it comes out worse than had I just
selectAll/copy'd and pasted into another application.

It looks like I might need to play with some of those switches a bit
more...

Thanks for the recommendation, Joel.

On 10/11/06, Joel Brauer <[email protected]> wrote:
>
> I would start with pdftotext  and then parse from there...
>
> pdftotext is part of the poppler-utils package on my system(Ubuntu)
>
> -joel
>
> On Wed, 2006-10-11 at 15:13 -0700, Roger E. Rustad, Jr. wrote:
> > I need to parse this PDF into a delimited text format
> >
> > http://www.riversideca.gov/finance/pdf/Business_List.pdf
> >
> > (Ideally, I'd like to do it in Perl b/c I hear Perl has some great
> > scraping/parsing features that would benefit me later on when I need
> > to do this kind of thing again.)
> >
> > Any suggestions?
> >
> > (I can copy the text into a txt file first, in case that makes the
> > scraping/parsing easier)
> > _______________________________________________
> > 909linux mailing list
> > [email protected]
> > http://909linux.org/cgi-bin/mailman/listinfo/909linux
> --
> Joel Brauer
> Manager IS
> Communications and Web Technologies
> [email protected]
> pager: [email protected]
> office: 909-558-7713
> cell: 909-534-1934
>
> Only you can decide to be happy! The rest of life is working out the
> details...
>

Re: [909linux] parsing PDF with Perl

Reply via email to