I followed section 9.10.2Mapping Character Codes to Unicode Values from the PDF reference manual to write a Font class to verify each character in a PDF document can be converted to unicode or not. I am attaching the Font class. I am sharing what I have for this unicode conversion. Also I am asking you all that is there any easier and better way to do this conversion? http://www.nabble.com/file/p25752712/Font.java Font.java
Leonard Rosenthol-3 wrote: > > 1) you need to also consider the case of inherited resources. > > 2) you are assuming that all font dicts will be indirect, that's not > necessary true. > > 3) You are assuming that encoding that is indirect is Differences and not > is not. Again, not true. Either way is valid for both direct/indirect. > > 4) for differences, you need to get the base encoding and then start from > that. > > Leonard > > -----Original Message----- > From: newoutlook [mailto:newoutlo...@yahoo.com] > Sent: Tuesday, September 15, 2009 6:00 PM > To: itext-questions@lists.sourceforge.net > Subject: Re: [iText-questions] Conversion of Encoding > > > I reviewed the section 9.10.2Mapping Character Codes to Unicode Values > from > PDF ISO-320000(1.7) document. I came up with the following segment of code > to get encodings for font for a text string. I am kinda of struck on > getting > differences array for font dictionary. I am not sure how to find unicode > values for the character codes. > > dict = reader.getPageN(1); > dict = (PdfDictionary)dict.get(PdfName.RESOURCES); > PdfDictionary font_ref = (PdfDictionary) dict.get(PdfName.FONT); > Set keys = font_ref.getKeys(); > for (Iterator i = keys.iterator(); i.hasNext();) { > PdfName name = (PdfName)i.next(); > System.out.println("font name =" + name); > PdfIndirectReference font_inref = (PdfIndirectReference) > font_ref.get(name); > System.out.println("font indirect ref =" + font_inref); > > PdfDictionary font_content = (PdfDictionary) > reader.getPdfObject(font_inref.getNumber()); > if (!(font_content.get(PdfName.ENCODING)).isIndirect()) { > System.out.println("Font encoding > ="+font_content.get(PdfName.ENCODING)); > > } > else { > > > PdfArray font_diff = (PdfArray) > font_content.get(PdfName.DIFFERENCES); > > //for (Iterator j = font_diff.getArrayList().iterator(); > j.hasNext();) > { > // PdfString font_diff_str = (PdfString) j.next(); > // System.out.println("font diff entry =" + > font_diff_str); > //} > } > } > > > > 1T3XT info wrote: >> >> newoutlook wrote: >>> I was wondering how do I get the encoding for text string using iText >>> API >>> before I interpret the text string. >> >> Get the content stream of the page. >> Find the operator that changes the font and >> get the operand that refers to the font (e.g. /F1). >> Look for the corresponding entry in the page resources. >> Inspect the font dictionary referred to in the page resources >> for the encoding. >> >> Expect no further help than the above; >> as Leonard told you: stuff like 'content stream', >> 'font dictionary', etc... is explained in ISO-320000. >> >> Low-level methods to get a page's content stream, >> the resources dictionary, and so on, are explained >> in chapter 18 of the book "iText in Action". >> -- >> This answer is provided by 1T3XT BVBA >> http://www.1t3xt.com/ - http://www.1t3xt.info >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9-12, 2009. Register >> now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> iText-questions mailing list >> iText-questions@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/itext-questions >> >> Buy the iText book: http://www.1t3xt.com/docs/book.php >> Check the site with examples before you ask questions: >> http://www.1t3xt.info/examples/ >> You can also search the keywords list: >> http://1t3xt.info/tutorials/keywords/ >> >> > > -- > View this message in context: > http://www.nabble.com/Conversion-of-Encoding-tp23984690p25462597.html > Sent from the iText - General mailing list archive at Nabble.com. > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register > now! > http://p.sf.net/sfu/devconf > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register > now! > http://p.sf.net/sfu/devconf > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > > -- View this message in context: http://www.nabble.com/Conversion-of-Encoding-tp23984690p25752712.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/