RE: [iText-questions] hidden text

Paulo Soares Mon, 07 Jun 2004 03:30:19 -0700

Title: hidden text

It shouldn't be too difficult if you know the pdf format. The steps are:

- parse the content to keep just the images. Use PRTokeniser.

- eliminate the references to the fonts in the resources dictionary.

- call pdfReader.removeUnusedObjects()

- use PdfStamper to output the pdf

Best Regards,

Paulo Soares

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Whenham Patrick (Gecotec)
Sent: Monday, June 07, 2004 11:02 AM
To: [EMAIL PROTECTED]
Subject: [iText-questions] hidden text

Hi,

I have tons of image + hidden text pdfs ( paper -> tif -> pdf -> OCR-image+hidden text ).

To comply with legal constraints we have to deal with (100% quality OCR or none), I must remove the hidden text from the pdf files (the text generated by running OCR 'image + hidden text' on image pdfs, text that does not display, but allows searching and copy & paste ).

The output must be a no-longer-searchable pdf file with no means of selecting & copying text.

Is there any function in iText that allows this ?

Thanks,
Patrick

Patrick WHENHAM
GECOTEC & UCB Pharma

---------------------------------------------------------
Legal Notice: This electronic mail and its attachments are intended solely for the person(s) to whom they are addressed and contain information which is confidential or otherwise protected from disclosure, except for the purpose they are intended to. Dissemination, distribution, or reproduction by anyone other than their intended recipients is prohibited and may be illegal. If you are not an intended recipient, please immediately inform the sender and send him/her back the present e-mail and its attachments and destroy any copies which may be in your possession.

---------------------------------------------------------

RE: [iText-questions] hidden text

Reply via email to