PDF content streams are more complicted than you may think. In particular, PDF can specify a mapping table to translate character sets (this is called a CMap). Crystal Reports uses CMaps in it's output (as do many other PDF generators - especially when international character sets are being used)
Lot's of info is here, on page 442:
At the present time, I'm not sure about iText's support for CMap type objects. There are several static functions in PdfEncodings that may do the trick, but I'm not aware of any documentation related to processing a content stream using CMaps - if there is any such info, I'd love to know about it.
Note also that PDF text operations do not necessarily lend themselves to direct text comparison. It's quite possible to have one half of a word in one text operation and the other half of the word in another text operation. You have to do spatial analysis on the text blocks to determine which words actually go together and in what order.
- K
----------------------- Original Message -----------------------
From: Vinoo <[EMAIL PROTECTED]>
Cc:
Date: Wed, 29 Oct 2008 08:55:13 -0700 (PDT)
Subject: [iText-questions] URGENT : Help with parsing the PDF generated by Crystal reports-V9
Hi, I am trying to parse the contents of the PDF with iTextSharp using : PdfReader reader = new PdfReader("Test.pdf"); reader.GetPageContent(pageNumber); byte[] pageContentByteArray; I am using this byte array to search for a partcular text based on a Delimiter pattern by converting this to string by using - string test = Encoding.ASCII.GetString(pageContentByteArray); I am able to match the required text pattern inside the string generated using the above statement. The above logic works absolutely fine if we use a normal PDF input file. My requirement is to read a PDF file which is created by CRYSTAL REPORTS (Version-9). I have a byte array of the page with me. But I tried to convert to string using ASCII, UNICODE , UTF8, UnicodeBig.. string test = Encoding.ASCII.GetString(invoicePageContentByteArray); string test = Encoding.Unicode.GetString(invoicePageContentByteArray); string test = Encoding.UTF8.GetString(invoicePageContentByteArray); ..... also using UnicodeBig The output is not in the readable format. I could not find any text in the page appearing in the output string. I guess the PDF generated out of crystal reports is using some other encoding format. (Note : We verified the template used by crystal reports to generate the PDF. The search delimiter pattern is defined as the Text object) There should be some way of doing the above. Not sure what is that I am missing here. Can anyone please suggest ideas to resolve the above problem. -- Regards, Uma -- View this message in context: http://www.nabble.com/URGENT-%3A-Help-with-parsing-the-PDF-generated-by-Crystal-reports-V9-tp20229737p20229737.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=""> _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php |
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.1t3xt.com/docs/book.php