I didn't notice that you were using SimpleTextExtractionStrategy. You should definitely try using the default text extraction strategy (LocationTextExtractionStrategy) - it's a lot better at pulling meaningful text from PDFs.
As for doing text matching while you are doing the extraction, you certainly can do that by writing your own text extraction strategy, but I doubt very much that it would be worth doing it that way - the time cost of parsing the PDF is *way* higher than any post-processing step you might be performing. That said, it looks like you are doing a bunch of regex substitutions, which could be a performance bottleneck. I'd suggest that you take the text from the extraction strategy, then do a single pass parse through it doing your substitution/etc... - that's not an iText question, really - just a generalized text processing question. -- View this message in context: http://itext-general.2136553.n4.nabble.com/Search-Text-and-Capacity-of-iText-to-read-tp4657270p4657280.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only - learn more at: http://p.sf.net/sfu/learnmore_122512 _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php