----- Original Message ----- From: "Jake C" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Tuesday, March 20, 2007 7:41 PM Subject: Re: [iText-questions] Can't view text segments in certain PDF files
> What did you do to the original document to create your modified version? Removed the invisible text rendering. > Why can't the TreeViewPDF tool view the Content of either my original > version or your modified version? Is it possible to make the structure of It probably has limitations. You should look at http://www.windjack.com/products/pdfcanopener.html. > one that doesn't convert to FlashPaper to look like the one that DOES > convert to FlashPaper using iText? > That's something that can't be done without the Flash environment and it goes somewhat above the scope of this mailing list. Paulo >>From: "Paulo Soares" <[EMAIL PROTECTED]> >>Reply-To: Post all your questions about iText here >><[email protected]> >>To: "Post all your questions about iText here" >><[email protected]> >>Subject: Re: [iText-questions] Can't view text segments in certain PDF >>files >>Date: Tue, 20 Mar 2007 18:50:46 -0000 >> >>The text pasted from the PDF to the clipboard is correct. It will probably >>require more investigation but not related to iText. >> >>Paulo >> >>----- Original Message ----- >>From: "Jake C" <[EMAIL PROTECTED]> >>To: <[email protected]> >>Sent: Tuesday, March 20, 2007 6:33 PM >>Subject: Re: [iText-questions] Can't view text segments in certain PDF >>files >> >> >> > No, there is actually text there now, but not a single one is >> > alphanumeric. >> > I'm pasting in the text that I copied/pasted into notepad: >> > >> > ¿½½·¼»²¬ó·²·¬·¿¬·²¹ >> > «¬·´·¬§ >> > ª»²¼±®ó«°°´·»¼ ¬¸»®³¿´ó¸§¼®¿«´·½ ½¿´½«´¿ó >> > ׬ >> > ·² >> > °´¿²¬ >> > ±³·¬¬»¼ô >> > ¼»ª»´±°»¼ >> > ¼·®»½¬´§ >> > °®»½»¼·²¹ >> > ««¿´´§ >> > ¾§ >> > ª»®§ >> > °´¿²¬ ¼»·¹² >> > ¿½½·¼»²¬ó·²·¬·¿¬·²¹ >> > »²¹·²»»®·²¹ >> > · >> > ª¿´«¿¾´» >> > ®·µó¿»³»²¬ °®±½»ô >> > ¿ >> > ©±«´¼ >> > ¿ º±®³¿´´§ ¼±½«³»²¬»¼ >> > ß²¿´§· >> > Ú«²½¬·±² Ûª»²¬ >> > °´¿²¬ >> > ³·¬·¹¿¬·²¹ >> > ·²·¬·¿¬·²¹ >> > ¬®¿²´¿¬»¼ >> > °»®º±®³·²¹ >> > °®±ª·¼» >> > °®»°¿®·²¹ >> > ³±®» >> > ¼»¬¿·´»¼ >> > Ú«²½¬·±² >> > ¿ ¼·¬·²½¬´§ >> > °´¿²¬ >> > »ª»²¬ >> > °®±ª·¼» ¿ >> > ¾¿»´·²» >> > °»®³·¬ ¿ >> > ¿°°®±¿½¸ >> > ¾»¬©»»² >> > ³·¬·¹¿¬·²¹ >> > °´¿²¬ >> > ¿ >> > »ª»²¬«¿´´§ ¼»½±³°±»¼ >> > ±® >> > «²¿ª¿·´¿¾·´·¬§ >> > ¯«¿²¬·¬¿¬·ª»´§ ³»¿«®»¼ò >> > ½±²¬®«½¬·²¹ >> > °®»ª»²¬ >> > ½±²»¯«»²½»ô >> > ®»´¿¬·±²¸·° >> > ¾»¬©»»² >> > ·²ª»²¬±®§ >> > ³¿·²ó >> > ¬¿·²»¼ô >> > ¿½½±³°´·¸»¼ò >> > ·² >> > »´·³·²¿¬·²¹ >> > ¸»¿¬ó®»³±ª¿´ >> > «½½»º«´´§ ³¿·²¬¿·²»¼ò >> > ¿ >> > ¿ >> > ÔÑÝßò >> > ½±²·¼»®»¼ >> > ïò >> > îò ݱ²¬¿·²³»²¬ ±ª»®°®»«®» >> > ¾´±©¼±©² ¾§ >> > >> > >> >>From: "Paulo Soares" <[EMAIL PROTECTED]> >> >>Reply-To: Post all your questions about iText here >> >><[email protected]> >> >>To: "Post all your questions about iText here" >> >><[email protected]> >> >>Subject: Re: [iText-questions] Can't view text segments in certain PDF >> >>files >> >>Date: Tue, 20 Mar 2007 18:02:04 -0000 >> >> >> >>See if it works now. >> >> >> >>Paulo >> >> >> >>----- Original Message ----- From: "Jake C" <[EMAIL PROTECTED]> >> >>To: <[email protected]> >> >>Sent: Tuesday, March 20, 2007 4:44 PM >> >>Subject: [iText-questions] Can't view text segments in certain PDF >> >>files >> >> >> >> >> >>>We use an OCR product to generate a PDF from a TIF with the original >> >>>image >> >>>plus hidden text, so that you can search/select the text, but only see >> >>>the >> >>>originally scanned image. We then use Adobe FlashPaper 2 to turn it >>into >> >>>a >> >>>SWF that can be imbedded in a web page. However, the hidden text is >>being >> >>>stripped out of the final SWF, so that it is no longer searchable. >>Adobe >> >>>considers this a "limitation" (we consider it a "bug"). Most other OCR >> >>>software has the same problem as the platform we chose, but there is >>one >> >>>that seems to convert to SWF just fine. In an attempt to find out what >> >>>the >> >>>difference was between the two files, I tried to use the Tree Viewer >>from >> >>>iText to examine the contents of the files. However, when I select the >> >>>Content node of the one that gets the text stripped out, I don't see >> >>>anything. If I use the API to try to extract the Stream directly, I >> >>>get >>a >> >>>NullPointerException. >> >>> >> >>>So I guess I really have two questions. >> >>> >> >>>1) Is there something wrong with how the PDF is constructed that we >> >>>cannot >> >>>examine the text content with iText, or is there a bug in iText? >> >>> >> >>>2) Is there a way we can manipulate the PDF from the OCR software we >> >>>chose >> >>>to make it structurally look like the one that actually keeps the text >> >>>when >> >>>converted to SWF? >> >>> >> >>>I'm attaching a copy of the two files (0112_094_no_text_select.pdf >> >>>from >> >>>our >> >>>selected OCR product, which we cannot view the text content, and >> >>>0112_094_text_select.pdf from the other product, which we CAN view the >> >>>text >> >>>content, and actually keeps the text in the SWF) in a zip file. >> >>> >> >>>OK, it seems I can't attach a file, or the message gets refused. I've >> >>>uploaded it to >>http://www.sharebigfile.com/file/116699/0112-094-zip.html >> >>> >> >>>_________________________________________________________________ >> >>>i'm making a difference. Make every IM count for the cause of your >> >>>choice. >> >>>Join Now. >> >>>http://clk.atdmt.com/MSN/go/msnnkwme0080000001msn/direct/01/?href=http://im.live.com/messenger/im/home/?source=hmtagline >> >>> >> >>> >> >>> >> >> >> >> >> >>-------------------------------------------------------------------------------- >> >> >> >> >> >>>------------------------------------------------------------------------- >> >>>Take Surveys. Earn Cash. Influence the Future of IT >> >>>Join SourceForge.net's Techsay panel and you'll get the chance to >> >>>share >> >>>your >> >>>opinions on IT & business topics through brief surveys-and earn cash >> >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> >> >> >> >> >>-------------------------------------------------------------------------------- >> >> >> >> >> >>>_______________________________________________ >> >>>iText-questions mailing list >> >>>[email protected] >> >>>https://lists.sourceforge.net/lists/listinfo/itext-questions >> >>>Buy the iText book: http://itext.ugent.be/itext-in-action/ >> >>> >> > >> > >> >><< 0112_094_no_text_select_mod.pdf >> ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/
