Do you get the right characters, irrespective of ordering when using
PdfTextExtractor? If you don't, the cmap won't help you as it's
already used in text extraction. Can you post a PDF?

Paulo

On Wed, Jun 26, 2013 at 6:20 PM, Mohammed Mostafa
<mohammed_mostafa1...@hotmail.com> wrote:
> Hello Mr Paulo,
>
> I know the direct way for extracting text but the problem is that i want to
> extract arabic text from pdf,
> when i extract text from pdf using iText i get the page stream with iText
> PRStream, thE arabic text come with strange codes
> (038f-00ac) and i want to convert these codes to original unicode by using
> CMap,
> My question, where cmap in font dictionary
> the stream i get with iText is:
>
> /TagSuspect <</TagSuspect /Ordering >>BDC  /P <</MCID 0/Lang (ar-EG)>> BDC
> BT
> /F1 14.04 Tf
> 1 0 0 1 518.02 707.14 Tm
> /GS10 gs
> 0 g
> /GS11 gs
> 0 G
> [<0003>4<03A2>5<039F039B>] TJ
> ...
> <object number="5" category="DICTIONARY" type="/Font" subtype="/Type0">
>    <DICTIONARY>
>       <INDIRECT key="/DescendantFonts" number="6" generation="0" value="6 0
> R" />
>       <NAME key="/BaseFont" value="/Arial" />
>       <NAME key="/Type" value="/Font" />
>       <NAME key="/Encoding" value="/Identity-H" />
>       <NAME key="/Subtype" value="/Type0" />
>       <INDIRECT key="/ToUnicode" number="30" generation="0" value="30 0 R"
> />
>    </DICTIONARY>
> </object>
>
> where CMap itself so that i can map these chaaracter codes to its unicode??
>
>> Date: Wed, 26 Jun 2013 17:52:05 +0100
>> From: pgpsoa...@gmail.com
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] Extract CMap from pdf file!
>
>>
>> This is an easy one, the ToUnicode cmap is in the font dictionary. You
>> can get the font dictionary from the page resources. Of course,
>> there's a direct way to extract text from a PDF using iText without
>> having to reinvent the wheel.
>>
>> Paulo
>>
>> On Wed, Jun 26, 2013 at 5:30 PM, Mohammed Mostafa
>> <mohammed_mostafa1...@hotmail.com> wrote:
>> > Hello All,
>> >
>> > I ask about how can i extract ToUnicode CMap from PDF file using iText
>> > libray?
>> >
>> > i am using iText PRStream to retrieve page stream from pdf but page
>> > stream
>> > not include CMap!!
>> >
>> > wait your reply fastly please...
>> >
>> > Thanks,
>> > Mohammed
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > This SF.net email is sponsored by Windows:
>> >
>> > Build for Windows Store.
>> >
>> > http://p.sf.net/sfu/windows-dev2dev
>> > _______________________________________________
>> > iText-questions mailing list
>> > iText-questions@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/itext-questions
>> >
>> > iText(R) is a registered trademark of 1T3XT BVBA.
>> > Many questions posted to this list can (and will) be answered with a
>> > reference to the iText book: http://www.itextpdf.com/book/
>> > Please check the keywords list before you ask for examples:
>> > http://itextpdf.com/themes/keywords.php
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> _______________________________________________
>> iText-questions mailing list
>> iText-questions@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/itext-questions
>>
>> iText(R) is a registered trademark of 1T3XT BVBA.
>> Many questions posted to this list can (and will) be answered with a
>> reference to the iText book: http://www.itextpdf.com/book/
>> Please check the keywords list before you ask for examples:
>> http://itextpdf.com/themes/keywords.php
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to