Most likely, the PDF actually contains the text multiple times (either right on top of itself, or possibly slightly offset to make the text appear bold). Take a look at the content stream itself (either using RUPS or dump the content using PdfContentReaderTool) and see if that isn't the case.
If that is the case, you may be able to contribute an enhancement to LocationTextExtractionStrategy that detects overlapping words - it may be really tricky, but it's worth investigation if this is important for you. -- View this message in context: http://itext-general.2136553.n4.nabble.com/Problems-on-text-extraction-on-iTextSharp-5-2-0-tp4652795p4652974.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
