We use an OCR product to generate a PDF from a TIF with the original image plus hidden text, so that you can search/select the text, but only see the originally scanned image. We then use Adobe FlashPaper 2 to turn it into a SWF that can be imbedded in a web page. However, the hidden text is being stripped out of the final SWF, so that it is no longer searchable. Adobe considers this a "limitation" (we consider it a "bug"). Most other OCR software has the same problem as the platform we chose, but there is one that seems to convert to SWF just fine. In an attempt to find out what the difference was between the two files, I tried to use the Tree Viewer from iText to examine the contents of the files. However, when I select the Content node of the one that gets the text stripped out, I don't see anything. If I use the API to try to extract the Stream directly, I get a NullPointerException.

So I guess I really have two questions.

1) Is there something wrong with how the PDF is constructed that we cannot examine the text content with iText, or is there a bug in iText?

2) Is there a way we can manipulate the PDF from the OCR software we chose to make it structurally look like the one that actually keeps the text when converted to SWF?

I'm attaching a copy of the two files (0112_094_no_text_select.pdf from our selected OCR product, which we cannot view the text content, and 0112_094_text_select.pdf from the other product, which we CAN view the text content, and actually keeps the text in the SWF) in a zip file.

OK, it seems I can't attach a file, or the message gets refused. I've uploaded it to http://www.sharebigfile.com/file/116699/0112-094-zip.html

_________________________________________________________________
i'm making a difference. Make every IM count for the cause of your choice. Join Now. http://clk.atdmt.com/MSN/go/msnnkwme0080000001msn/direct/01/?href=http://im.live.com/messenger/im/home/?source=hmtagline


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://itext.ugent.be/itext-in-action/

Reply via email to