It should not give me a blank page. The page is actually a scanned image -- or somehow the entire mediabox is filled with an image of the document (I say somehow, because the Producer info says iText 2.1.4, but there's no actual Contents in the original document's page).
The "/Contents" of the page is actually what a user entered using the Acrobat Std/Pro "Touch-Up Text Tool". That's all. The actual text-content of the document isn't text at all; like I said, it's a single image that fills the entire mediabox. Basically, they use that tool instead of a more appropriate Annotation mechanism such as a "stamp" or "text box". Thus the /Contents and the actual document's content is entirely different. Since the actual content of the document is an image, we are sending it to an OCR step. If there is something in the /Contents, the OCR engine assumes there is no need to OCR and the result is virtually the same output PDF. I need to remove that /Contents object so the OCR engine detects that it needs to OCR the underlying image; not rely upon the exiting text. So I would expect that if I CAN remove the /Contents object from a page, and there is still an image filling the mediabox for that page, we would still have that displayed (and OCR'ed correctly). On 2/15/2012 9:32 AM, Leonard Rosenthol wrote: > You can't remove the entire stream - that would give you a blank page! > > As Bruno said, you need to parse/analyze the page content and determine what > is "good" and what is "bad". > > Leonard > > -----Original Message----- > From: AJ Weber [mailto:awe...@comcast.net] > Sent: Wednesday, February 15, 2012 9:27 AM > To: itext-questions@lists.sourceforge.net > Subject: Re: [iText-questions] Strip Annotations? > > On 2/14/2012 11:02 AM, Leonard Rosenthol wrote: >> Sure, it's possible that they are using some tool that adds text directly to >> the content instead of as an annotation. Perfectly valid. >> >> In which case, removal is MUCH harder (but not impossible) > OK...if I need to remove a page's /Contents object (and thus stream), can > anyone point to a quick method to do that? Do I need to use one of the > "lower level" methods, or which class/method would be recommended? > > Thanks again, > AJ > > ------------------------------------------------------------------------------ > Virtualization& Cloud Management Using Capacity Planning Cloud computing > makes use of virtualization - but cloud computing also focuses on allowing > computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > iText(R) is a registered trademark of 1T3XT BVBA. > Many questions posted to this list can (and will) be answered with a > reference to the iText book: http://www.itextpdf.com/book/ Please check the > keywords list before you ask for examples: > http://itextpdf.com/themes/keywords.php > > ------------------------------------------------------------------------------ > Virtualization& Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > iText(R) is a registered trademark of 1T3XT BVBA. > Many questions posted to this list can (and will) be answered with a > reference to the iText book: http://www.itextpdf.com/book/ > Please check the keywords list before you ask for examples: > http://itextpdf.com/themes/keywords.php ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php