Just ran pdftotext, and it comes out worse than had I just selectAll/copy'd and pasted into another application.
It looks like I might need to play with some of those switches a bit more... Thanks for the recommendation, Joel. On 10/11/06, Joel Brauer <[email protected]> wrote:
I would start with pdftotext and then parse from there... pdftotext is part of the poppler-utils package on my system(Ubuntu) -joel On Wed, 2006-10-11 at 15:13 -0700, Roger E. Rustad, Jr. wrote: > I need to parse this PDF into a delimited text format > > http://www.riversideca.gov/finance/pdf/Business_List.pdf > > (Ideally, I'd like to do it in Perl b/c I hear Perl has some great > scraping/parsing features that would benefit me later on when I need > to do this kind of thing again.) > > Any suggestions? > > (I can copy the text into a txt file first, in case that makes the > scraping/parsing easier) > _______________________________________________ > 909linux mailing list > [email protected] > http://909linux.org/cgi-bin/mailman/listinfo/909linux -- Joel Brauer Manager IS Communications and Web Technologies [email protected] pager: [email protected] office: 909-558-7713 cell: 909-534-1934 Only you can decide to be happy! The rest of life is working out the details...
