I don't think anyone needs to concur with Leonard...he's a bit of a PDF guru ;).

But I'll chime-in that I looked-into this as well, and it's not trivial.  Each 
page will have some text within it for the actual PDF specification/format.  So 
it's not a binary decision on whether there's any text on the page or not.  You 
have to build a parser that will separate known-formatting text, etc., from 
actual document-content.  I'm sure it can be done, but I never got the time to 
look into it further.  I got as far as looking at the font types and character 
sets to determine what part of the world the document came from (so I could 
send it to the appropriate OCR engine).

Good Luck!
-AJ
  ----- Original Message ----- 
  From: Leonard Rosenthol 
  To: Post all your questions about iText here 
  Sent: Friday, February 23, 2007 7:06 AM
  Subject: Re: [iText-questions] how to identify image only pdf?


  On Feb 22, 2007, at 11:36 PM, a b wrote:

  > Hi, I already have made the pdf file, but I need to
  > identify whether the pdf is an image only pdf or it
  > also contain some text. I was wondering will itext api
  > can identify such difference, please let me know
  >
  There is no high level API for iText to do it, though you can get  
  all the necessary information from iText by iterating over each page  
  then examining the content streams of each page to see what's there.



  Leonard


  -------------------------------------------------------------------------
  Take Surveys. Earn Cash. Influence the Future of IT
  Join SourceForge.net's Techsay panel and you'll get the chance to share your
  opinions on IT & business topics through brief surveys-and earn cash
  http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
  _______________________________________________
  iText-questions mailing list
  [email protected]
  https://lists.sourceforge.net/lists/listinfo/itext-questions
  Buy the iText book: http://itext.ugent.be/itext-in-action/
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/

Reply via email to