Re: [909linux] parsing PDF with Perl

Roger E. Rustad, Jr. Wed, 11 Oct 2006 16:48:25 -0700 (PDT)

Just ran pdftotext, and it comes out worse than had I just selectAll/copy'd
and pasted into another application.


It looks like I might need to play with some of those switches a bit more...

Thanks for the recommendation, Joel.

On 10/11/06, Joel Brauer <[email protected]> wrote:


I would start with pdftotext  and then parse from there...

pdftotext is part of the poppler-utils package on my system(Ubuntu)

-joel

On Wed, 2006-10-11 at 15:13 -0700, Roger E. Rustad, Jr. wrote:
> I need to parse this PDF into a delimited text format
>
> http://www.riversideca.gov/finance/pdf/Business_List.pdf
>
> (Ideally, I'd like to do it in Perl b/c I hear Perl has some great
> scraping/parsing features that would benefit me later on when I need
> to do this kind of thing again.)
>
> Any suggestions?
>
> (I can copy the text into a txt file first, in case that makes the
> scraping/parsing easier)
> _______________________________________________
> 909linux mailing list
> [email protected]
> http://909linux.org/cgi-bin/mailman/listinfo/909linux
--
Joel Brauer
Manager IS
Communications and Web Technologies
[email protected]
pager: [email protected]
office: 909-558-7713
cell: 909-534-1934

Only you can decide to be happy! The rest of life is working out the
details...

Re: [909linux] parsing PDF with Perl

Reply via email to