Here's a way you might try.

1. Print the PDF to a PostScript file.
2. Open the file with a text editor to see if the text is in the order you 
desire (It might not be).
3. Write a simple script that parses the PS file, line by line, with a regular 
expression to match the desired text characters and drop the characters that 
are wrapped around the desired text.

Cheers,
Bill Segraves
-------------- Original message ----------------------
From: Oscar P <[email protected]>
>
> Hi,
> 
> I want to extract the text of diferent PDFs. And i have seen that iText
> includes PdfTextExtractor, but does not work with PDFs generated with
> Acrobat 8 professional. is there a way to extact the text of these PDFs?
> 
> 
> Regards, Oscar






------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to