Re: [iText-questions] Problems on text extraction on iTextSharp 5.2.0

Kevin Day Thu, 24 May 2012 06:29:48 -0700

Most likely, the PDF actually contains the text multiple times (either right
on top of itself, or possibly slightly offset to make the text appear bold). 
Take a look at the content stream itself (either using RUPS or dump the
content using PdfContentReaderTool) and see if that isn't the case.


If that is the case, you may be able to contribute an enhancement to
LocationTextExtractionStrategy that detects overlapping words - it may be
really tricky, but it's worth investigation if this is important for you.

--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Problems-on-text-extraction-on-iTextSharp-5-2-0-tp4652795p4652974.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Re: [iText-questions] Problems on text extraction on iTextSharp 5.2.0

Reply via email to