Christian, Christian Eric Paran wrote > is it possible to fix the format of PDF content when BEING extracted? like > removing the `Newlines`
What you call 'fixing' the format, others might call 'breaking' it. In general, therefore, the content should be returned as untempered as possible by the standard text extraction strategies, merely the line breaks and word divisions detected by coordinate jumps should be inserted. This being said you can obviously create your own text extraction strategy which ignores gaps between text segments and even throws away content white space as your current regular expressions do. Simply copy the SimpleTextExtractionStrategy you currently use and adapt its renderText method accordingly. BTW, Christian Eric Paran wrote > I am making a Search Engine using PDF Files as a Source. When the PDF > content is Extracted It has to be good looking. Unless you somehow can be sure that the PDFs you search have page contents ordered in reading order, the SimpleTextExtractionStrategy may be too simple a strategy to use anyways. Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/Search-Text-and-Capacity-of-iText-to-read-tp4657270p4657275.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and more. Get SQL Server skills now (including 2012) with LearnDevNow - 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only - learn more at: http://p.sf.net/sfu/learnmore_122512 _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php