Thanks chaps, I think this is helpful.
I decided to keep this job running on Odesk, instead of diving in myself:
https://www.odesk.com/jobs/Finish-code-identify-redactions-PDF-document-using-iTextSharp_~~95d549b336de0217
Some day, some day, I will spend an afternoon with iText in Action to
understand PDF :)
On Tue, Sep 4, 2012 at 8:13 PM, mkl <[email protected]> wrote:
> Nick,
>
> 1T3XT BVBA wrote
> >
> >>
> http://stackoverflow.com/questions/12255508/how-to-extract-text-marked-for-redaction-from-a-pdf-using-itextsharp
> >
> > When iText extracts text from a PDF, it produces TextRenderInfo objects.
> > These objects contain the PDF Strings the way they are present in the PDF
> > syntax of a content stream. When you use a RegionTextRenderFilter, you
> get
> > all the snippets of text that intersect with the region of your
> > choice. The snippets aren't 'clipped' to match that region. That's
> > something you'd need to do in the postprocessing of the snippets by using
> > the metrics provided by the TextRenderInfo objects.
>
> Being curious I had a look at your document; Bruno, of course, was right.
> For example the first mark on page 2 of your markedForRedaction.pdf test
> file shows the problem quite clearly. The line
>
> commodo ultrices imperdiet. Vestibulum ut justo vel sapien venenatis
> tincidunt. Phasellus eget
>
> is marked for redaction from "Vestibulum" until the end. Looking at the
> page
> content you find this line being described in PDF syntax like this:
>
> [(c)-3(o)6(m)9(mo)4(do)5( ult)12(r)-3(i)3(c)-3(es)4(
> i)16(mp)-5(e)12(r)7(diet. V)10(es)4(tib)10(ulu)10(m u)-3(t
> j)14(ust)11(o)-4(
> v)3(el)15( s)5(ap)-4(i)3(en )11(ven)11(ena)7(tis)5(
> tin)12(c)-3(i)3(du)7(n)9(t. P)4(ha)-4(s)4(el)4(l)15(us ege)13(t )] TJ
>
> So the mark begins inside of "diet. V" and, therefore, your code returns
> the
> line prefixed by "diet. ".
>
> Regards, Michael
>
>
>
> --
> View this message in context:
> http://itext-general.2136553.n4.nabble.com/How-to-extract-text-marked-for-redaction-tp4656134p4656139.html
> Sent from the iText - General mailing list archive at Nabble.com.
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php