First off, realise that a pdf isn't just a marked up text document.
It's a wrapper for images and text, movies and many other formats.

If you have a text pdf, then the text is a postscript object catalogued somewhere within the pdf. I've never done this in perl, but there are many commercial utilities around to do it, although it's usually an ocr process. There maybe some info on cpan about it, but if you cant' find anything, then look into first extracting the postscript content, and then there's bound to be a postscript decoder available from somewhere. Not sure off hand though.






Jenny Chen wrote:

Hi Everyone,

Does anyone know how to convert a pdf file to text
file in Perl?  Thanks

Jenny




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to