Nick,

1T3XT BVBA wrote
> 
>> http://stackoverflow.com/questions/12255508/how-to-extract-text-marked-for-redaction-from-a-pdf-using-itextsharp
>>  
> 
> When iText extracts text from a PDF, it produces TextRenderInfo objects.
> These objects contain the PDF Strings the way they are present in the PDF
> syntax of a content stream. When you use a RegionTextRenderFilter, you get
> all the snippets of text that intersect with the region of your 
> choice. The snippets aren't 'clipped' to match that region. That's
> something you'd need to do in the postprocessing of the snippets by using
> the metrics provided by the TextRenderInfo objects.

Being curious I had a look at your document; Bruno, of course, was right.
For example the first mark on page 2 of your markedForRedaction.pdf test
file shows the problem quite clearly. The line

commodo ultrices imperdiet. Vestibulum ut justo vel sapien venenatis
tincidunt. Phasellus eget

is marked for redaction from "Vestibulum" until the end. Looking at the page
content you find this line being described in PDF syntax like this:

[(c)-3(o)6(m)9(mo)4(do)5( ult)12(r)-3(i)3(c)-3(es)4(
i)16(mp)-5(e)12(r)7(diet. V)10(es)4(tib)10(ulu)10(m u)-3(t j)14(ust)11(o)-4(
v)3(el)15( s)5(ap)-4(i)3(en )11(ven)11(ena)7(tis)5(
tin)12(c)-3(i)3(du)7(n)9(t. P)4(ha)-4(s)4(el)4(l)15(us ege)13(t )] TJ

So the mark begins inside of "diet. V" and, therefore, your code returns the
line prefixed by "diet. ".

Regards,   Michael



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/How-to-extract-text-marked-for-redaction-tp4656134p4656139.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to