Nick, 1T3XT BVBA wrote > >> http://stackoverflow.com/questions/12255508/how-to-extract-text-marked-for-redaction-from-a-pdf-using-itextsharp >> > > When iText extracts text from a PDF, it produces TextRenderInfo objects. > These objects contain the PDF Strings the way they are present in the PDF > syntax of a content stream. When you use a RegionTextRenderFilter, you get > all the snippets of text that intersect with the region of your > choice. The snippets aren't 'clipped' to match that region. That's > something you'd need to do in the postprocessing of the snippets by using > the metrics provided by the TextRenderInfo objects.
Being curious I had a look at your document; Bruno, of course, was right. For example the first mark on page 2 of your markedForRedaction.pdf test file shows the problem quite clearly. The line commodo ultrices imperdiet. Vestibulum ut justo vel sapien venenatis tincidunt. Phasellus eget is marked for redaction from "Vestibulum" until the end. Looking at the page content you find this line being described in PDF syntax like this: [(c)-3(o)6(m)9(mo)4(do)5( ult)12(r)-3(i)3(c)-3(es)4( i)16(mp)-5(e)12(r)7(diet. V)10(es)4(tib)10(ulu)10(m u)-3(t j)14(ust)11(o)-4( v)3(el)15( s)5(ap)-4(i)3(en )11(ven)11(ena)7(tis)5( tin)12(c)-3(i)3(du)7(n)9(t. P)4(ha)-4(s)4(el)4(l)15(us ege)13(t )] TJ So the mark begins inside of "diet. V" and, therefore, your code returns the line prefixed by "diet. ". Regards, Michael -- View this message in context: http://itext-general.2136553.n4.nabble.com/How-to-extract-text-marked-for-redaction-tp4656134p4656139.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
