Christian,

Christian Eric Paran wrote
> is it possible to fix the format of PDF content when BEING extracted? like
> removing the `Newlines`

What you call 'fixing' the format, others might call 'breaking' it. In
general, therefore, the content should be returned as untempered as possible
by the standard text extraction strategies, merely the line breaks and word
divisions detected by coordinate jumps should be inserted.

This being said you can obviously create your own text extraction strategy
which ignores gaps between text segments and even throws away content white
space as your current regular expressions do. Simply copy the
SimpleTextExtractionStrategy you currently use and adapt its renderText
method accordingly.

BTW,

Christian Eric Paran wrote
> I am making a Search Engine using PDF Files as a Source. When the PDF
> content is Extracted It has to be good looking.

Unless you somehow can be sure that the PDFs you search have page contents
ordered in reading order, the SimpleTextExtractionStrategy may be too simple
a strategy to use anyways.

Regards,   Michael



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Search-Text-and-Capacity-of-iText-to-read-tp4657270p4657275.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to