On Mon, 2 May 2011 09:19:37 +0000 "Balraj Balakrishnan, Integra-PDY, IN" <[email protected]> wrote: >As am new to freetype and all these font stuffs, I couldn't rather >frame my requirement in a right manner. I shall be making an another >attempt to bring about much more clarity in what I really want from >freetype:
OK. I think there is not special manner specific to this list, the clarification of input, process and output is important in any mailing list of open sources. >1. The scenario here is, we are trying to convert the source PDF into >an HTML, while doing this there are many fonts in the PDF which are >extracted or mapped to a wrong character. I see. What software translating from PDF to HTML you're using? Could you post (or upload to any web site) a sample PDF that you have some issue? Basically, an elementary font object in PDF (a data segment which you spliced from PDF and pass to FT_New_Face()) is not expected to hold an interface to character encoding. For the relationship between glyph index (or glyph name) and the character code, /Encoding or /ToUnicode elements in wrapping font object in PDF (which refers its elementary font object via /BaseFont object). Referrer's /Encoding dictionary can override the built-in encoding info in the referred font. I think there are existing softwares like pdftohtml which do such work in good level. >So we are extracting the font files from the PDF, to >convert glyph's (Symbols, Unicode) in the font file >as an image and replace the wrongly extracted characters >/Symbols/Unicode in the HTML file with the image. As I've written in above, extacted font file is insufficient resource to guess the codespoint for the glyphs. >In the above mentioned scenario the image should maintain >its position in the outline in order place it in an HTML >file. If you look at the image below the fonts Quote >right and the Comma is differentiated based on its position >in a given line. Do you say that your program (at present) cannot detect the character code point for the single quote glyph and the comma glyph from PDF, then you want to guess the codepoints by checking the indepth of the font? Does Adobe Acrobat extract the text from your PDF? _______________________________________________ Freetype mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/freetype
