hmmm - well that is certainly a compelling idea.  We currently use a commerical OCR engine in one of our products, but it has the limitation of converting the PDF to a full image prior to processing it, then export the result to PDF.  If there was any annotation or form content, it gets lost.  If a solution could be developed that *only* processed the images, that would be quite desirable indeed.
 
So the solution would consist of extracting all image XObjects, constructing a final image with *only* that content, running that through OCR, obtaining the text and placement, then writing that with invisible text on a top layer in the PDF.  (I suppose that each image could be processed in isolation from the others, but many of the commercial engines charge per page).
 
Tesseract's performance is still not nearly good enough to compete with the big commercial packages - but maybe a solution could be developed that would use a pluggable OCR engine.  It should be farily straightforward to abstract out a common interface...
 
- K
 
----------------------- Original Message -----------------------
  
From: Schalück, Elmar<[EMAIL PROTECTED]>
Cc: 
Date: Thu, 6 Nov 2008 20:01:05 +0100
Subject: [iText-questions] Comments on com.lowagie.text.pdf.parser addition to SVN
  
Hi,
another idea was to extract the images and do some OCR on them (e.g. with tesseract) and put the text behind the image to be better searchable.
Elmar

>Kevin Day wrote:
>> I saw mention a bit earlier of using the content parser to determine
>> embedded image (or other xobject) location, etc...  I'm wondering about
>> some examples where this would be useful (outside of maybe trying to
>> actually render a PDF page into a JComponent or a TIFF image).
>It's a question that surfaced a couple of times on the mailing list.
>I believe that the requirement was to add text on top of the images.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url="">
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to