Here's a way you might try. 1. Print the PDF to a PostScript file. 2. Open the file with a text editor to see if the text is in the order you desire (It might not be). 3. Write a simple script that parses the PS file, line by line, with a regular expression to match the desired text characters and drop the characters that are wrapped around the desired text.
Cheers, Bill Segraves -------------- Original message ---------------------- From: Oscar P <[email protected]> > > Hi, > > I want to extract the text of diferent PDFs. And i have seen that iText > includes PdfTextExtractor, but does not work with PDFs generated with > Acrobat 8 professional. is there a way to extact the text of these PDFs? > > > Regards, Oscar ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
