Re: [iText-questions] URGENT : Help with parsing the PDF generated by Crystal reports-V9

Kevin Day Wed, 29 Oct 2008 10:03:47 -0700

PDF content streams are more complicted than you may think. In particular, PDF can specify a mapping table to translate character sets (this is called a CMap). Crystal Reports uses CMaps in it's output (as do many other PDF generators - especially when international character sets are being used)

Lot's of info is here, on page 442:

http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf

At the present time, I'm not sure about iText's support for CMap type objects. There are several static functions in PdfEncodings that may do the trick, but I'm not aware of any documentation related to processing a content stream using CMaps - if there is any such info, I'd love to know about it.

Note also that PDF text operations do not necessarily lend themselves to direct text comparison. It's quite possible to have one half of a word in one text operation and the other half of the word in another text operation. You have to do spatial analysis on the text blocks to determine which words actually go together and in what order.

- K

----------------------- Original Message -----------------------

From: Vinoo <[EMAIL PROTECTED]>

To: itext-questions@lists.sourceforge.net

Cc:

Date: Wed, 29 Oct 2008 08:55:13 -0700 (PDT)

Subject: [iText-questions] URGENT : Help with parsing the PDF generated by Crystal reports-V9

Hi,
I am trying to parse the contents of the PDF with iTextSharp using :
PdfReader reader = new PdfReader("Test.pdf");
reader.GetPageContent(pageNumber);
byte[] pageContentByteArray;
I am using this byte array to search for a partcular text based on a
Delimiter pattern by converting this to string by using -
string test = Encoding.ASCII.GetString(pageContentByteArray);
I am able to match the required text pattern inside the string generated
using the above statement. The above logic works absolutely fine if we use a
normal PDF input file.
My requirement is to read a PDF file which is created by CRYSTAL REPORTS
(Version-9).
I have a byte array of the page with me. But I tried to convert to string
using ASCII, UNICODE , UTF8, UnicodeBig..
           string test =
Encoding.ASCII.GetString(invoicePageContentByteArray);
            string test =
Encoding.Unicode.GetString(invoicePageContentByteArray);
           string test =
Encoding.UTF8.GetString(invoicePageContentByteArray);
                       ..... also using UnicodeBig

The output is not in the readable format. I could not find any text in the
page appearing in the output string. I guess the PDF generated out of
crystal reports is using some other encoding format.
(Note : We verified the template used by crystal reports to generate the
PDF. The search delimiter pattern is defined as the Text object)
There should be some way of doing the above. Not sure what is that I am
missing here. Can anyone please suggest ideas to resolve the above problem.

--
Regards,
Uma
--
View this message in context: http://www.nabble.com/URGENT-%3A-Help-with-parsing-the-PDF-generated-by-Crystal-reports-V9-tp20229737p20229737.html
Sent from the iText - General mailing list archive at Nabble.com.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url="">
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions


Buy the iText book: http://www.1t3xt.com/docs/book.php

Re: [iText-questions] URGENT : Help with parsing the PDF generated by Crystal reports-V9

Reply via email to