It should not give me a blank page.  The page is actually a scanned 
image -- or somehow the entire mediabox is filled with an image of the 
document (I say somehow, because the Producer info says iText 2.1.4, but 
there's no actual Contents in the original document's page).

The "/Contents" of the page is actually what a user entered using the 
Acrobat Std/Pro "Touch-Up Text Tool".  That's all.  The actual 
text-content of the document isn't text at all; like I said, it's a 
single image that fills the entire mediabox.  Basically, they use that 
tool instead of a more appropriate Annotation mechanism such as a 
"stamp" or "text box".

Thus the /Contents and the actual document's content is entirely different.

Since the actual content of the document is an image, we are sending it 
to an OCR step.  If there is something in the /Contents, the OCR engine 
assumes there is no need to OCR and the result is virtually the same 
output PDF.  I need to remove that /Contents object so the OCR engine 
detects that it needs to OCR the underlying image; not rely upon the 
exiting text.

So I would expect that if I CAN remove the /Contents object from a page, 
and there is still an image filling the mediabox for that page, we would 
still have that displayed (and OCR'ed correctly).


On 2/15/2012 9:32 AM, Leonard Rosenthol wrote:
> You can't remove the entire stream - that would give you a blank page!
>
> As Bruno said, you need to parse/analyze the page content and determine what 
> is "good" and what is "bad".
>
> Leonard
>
> -----Original Message-----
> From: AJ Weber [mailto:awe...@comcast.net]
> Sent: Wednesday, February 15, 2012 9:27 AM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] Strip Annotations?
>
> On 2/14/2012 11:02 AM, Leonard Rosenthol wrote:
>> Sure, it's possible that they are using some tool that adds text directly to 
>> the content instead of as an annotation.  Perfectly valid.
>>
>> In which case, removal is MUCH harder (but not impossible)
> OK...if I need to remove a page's /Contents object (and thus stream), can 
> anyone point to a quick method to do that?  Do I need to use one of the 
> "lower level" methods, or which class/method would be recommended?
>
> Thanks again,
> AJ
>
> ------------------------------------------------------------------------------
> Virtualization&  Cloud Management Using Capacity Planning Cloud computing 
> makes use of virtualization - but cloud computing also focuses on allowing 
> computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a 
> reference to the iText book: http://www.itextpdf.com/book/ Please check the 
> keywords list before you ask for examples: 
> http://itextpdf.com/themes/keywords.php
>
> ------------------------------------------------------------------------------
> Virtualization&  Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a 
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples: 
> http://itextpdf.com/themes/keywords.php

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to