I don't think anyone needs to concur with Leonard...he's a bit of a PDF guru ;).
But I'll chime-in that I looked-into this as well, and it's not trivial. Each
page will have some text within it for the actual PDF specification/format. So
it's not a binary decision on whether there's any text on the page or not. You
have to build a parser that will separate known-formatting text, etc., from
actual document-content. I'm sure it can be done, but I never got the time to
look into it further. I got as far as looking at the font types and character
sets to determine what part of the world the document came from (so I could
send it to the appropriate OCR engine).
Good Luck!
-AJ
----- Original Message -----
From: Leonard Rosenthol
To: Post all your questions about iText here
Sent: Friday, February 23, 2007 7:06 AM
Subject: Re: [iText-questions] how to identify image only pdf?
On Feb 22, 2007, at 11:36 PM, a b wrote:
> Hi, I already have made the pdf file, but I need to
> identify whether the pdf is an image only pdf or it
> also contain some text. I was wondering will itext api
> can identify such difference, please let me know
>
There is no high level API for iText to do it, though you can get
all the necessary information from iText by iterating over each page
then examining the content streams of each page to see what's there.
Leonard
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/