I am new to iText but could use some help getting pointed in the right direction.
When receiving a PDF from an outside source that contains text we would prefer the text to be selectable so it can be searched and is not just text contained within an image. Therefore, if I can determine if a PDF contains Text only, Image and Text, or Image only I can pull aside the Image only ones so that a person can visually determine if the image is a true graphic or contains text in a graphic format and should be OCR'ed. Looking at sample files in the TreeViewPDF tool I noticed that the PdfDictionary listed a PdfArray object that can contain references to /PDF, /Text, and /ImageB that appeared to detect these three scenarios appropriately. Could someone provide me a code snippet or point me in the right direction (including references I could find in the iText book) that would show me how the TreeViewPDF tool extracts this information from a PDF file? Also, any comments about the legitimacy of solving my problem this way. Thanks, for any help you might be able to provide. -- View this message in context: http://www.nabble.com/Need-to-detect-PDF%27s-w-Image-only-vs.-Text-tf4882217.html#a13972254 Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/
