Do you get the right characters, irrespective of ordering when using PdfTextExtractor? If you don't, the cmap won't help you as it's already used in text extraction. Can you post a PDF?
Paulo On Wed, Jun 26, 2013 at 6:20 PM, Mohammed Mostafa <mohammed_mostafa1...@hotmail.com> wrote: > Hello Mr Paulo, > > I know the direct way for extracting text but the problem is that i want to > extract arabic text from pdf, > when i extract text from pdf using iText i get the page stream with iText > PRStream, thE arabic text come with strange codes > (038f-00ac) and i want to convert these codes to original unicode by using > CMap, > My question, where cmap in font dictionary > the stream i get with iText is: > > /TagSuspect <</TagSuspect /Ordering >>BDC /P <</MCID 0/Lang (ar-EG)>> BDC > BT > /F1 14.04 Tf > 1 0 0 1 518.02 707.14 Tm > /GS10 gs > 0 g > /GS11 gs > 0 G > [<0003>4<03A2>5<039F039B>] TJ > ... > <object number="5" category="DICTIONARY" type="/Font" subtype="/Type0"> > <DICTIONARY> > <INDIRECT key="/DescendantFonts" number="6" generation="0" value="6 0 > R" /> > <NAME key="/BaseFont" value="/Arial" /> > <NAME key="/Type" value="/Font" /> > <NAME key="/Encoding" value="/Identity-H" /> > <NAME key="/Subtype" value="/Type0" /> > <INDIRECT key="/ToUnicode" number="30" generation="0" value="30 0 R" > /> > </DICTIONARY> > </object> > > where CMap itself so that i can map these chaaracter codes to its unicode?? > >> Date: Wed, 26 Jun 2013 17:52:05 +0100 >> From: pgpsoa...@gmail.com >> To: itext-questions@lists.sourceforge.net >> Subject: Re: [iText-questions] Extract CMap from pdf file! > >> >> This is an easy one, the ToUnicode cmap is in the font dictionary. You >> can get the font dictionary from the page resources. Of course, >> there's a direct way to extract text from a PDF using iText without >> having to reinvent the wheel. >> >> Paulo >> >> On Wed, Jun 26, 2013 at 5:30 PM, Mohammed Mostafa >> <mohammed_mostafa1...@hotmail.com> wrote: >> > Hello All, >> > >> > I ask about how can i extract ToUnicode CMap from PDF file using iText >> > libray? >> > >> > i am using iText PRStream to retrieve page stream from pdf but page >> > stream >> > not include CMap!! >> > >> > wait your reply fastly please... >> > >> > Thanks, >> > Mohammed >> > >> > >> > ------------------------------------------------------------------------------ >> > This SF.net email is sponsored by Windows: >> > >> > Build for Windows Store. >> > >> > http://p.sf.net/sfu/windows-dev2dev >> > _______________________________________________ >> > iText-questions mailing list >> > iText-questions@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/itext-questions >> > >> > iText(R) is a registered trademark of 1T3XT BVBA. >> > Many questions posted to this list can (and will) be answered with a >> > reference to the iText book: http://www.itextpdf.com/book/ >> > Please check the keywords list before you ask for examples: >> > http://itextpdf.com/themes/keywords.php >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> iText-questions mailing list >> iText-questions@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/itext-questions >> >> iText(R) is a registered trademark of 1T3XT BVBA. >> Many questions posted to this list can (and will) be answered with a >> reference to the iText book: http://www.itextpdf.com/book/ >> Please check the keywords list before you ask for examples: >> http://itextpdf.com/themes/keywords.php > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > iText(R) is a registered trademark of 1T3XT BVBA. > Many questions posted to this list can (and will) be answered with a > reference to the iText book: http://www.itextpdf.com/book/ > Please check the keywords list before you ask for examples: > http://itextpdf.com/themes/keywords.php ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php