[iText-questions] Need to detect PDF's w/Image only vs. Text

021336 Tue, 27 Nov 2007 09:30:45 -0800

I am new to iText but could use some help getting pointed in the right
direction.

When receiving a PDF from an outside source that contains text we would
prefer the text to be selectable so it can be searched and is not just text
contained within an image. Therefore, if I can determine if a PDF contains
Text only, Image and Text, or Image only I can pull aside the Image only
ones so that a person can visually determine if the image is a true graphic
or contains text in a graphic format and should be OCR'ed.

Looking at sample files in the TreeViewPDF tool I noticed that the
PdfDictionary listed a PdfArray object that can contain references to /PDF,
/Text, and /ImageB that appeared to detect these three scenarios
appropriately.

Could someone provide me a code snippet or point me in the right direction
(including references I could find in the iText book) that would show me how
the TreeViewPDF tool extracts this information from a PDF file? Also, any
comments about the legitimacy of solving my problem this way.

Thanks, for any help you might be able to provide.
--
View this message in context:
http://www.nabble.com/Need-to-detect-PDF%27s-w-Image-only-vs.-Text-tf4882217.html#a13972254
Sent from the iText - General mailing list archive at Nabble.com.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/

[iText-questions] Need to detect PDF's w/Image only vs. Text

Reply via email to