[ 
https://issues.apache.org/jira/browse/PDFBOX-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244998#comment-14244998
 ] 

John Hewson edited comment on PDFBOX-2563 at 12/12/14 11:46 PM:
----------------------------------------------------------------

We need a generic way to find substitute fonts, rather than hardcoding them. We 
only hardcoded the standard 14 fonts and use heuristics to find all other 
missing fonts.

There's a problem with trying to find substitutes using the Unicode characters, 
due to [Han unification|http://www.unicode.org/faq/han_cjk.html], where the 
same glyph might look very different in Japanese vs Korean vs Chinese but they 
share the same Unicode character.

TTF fonts don't explicitly label the languages which they contain. One idea I 
can suggest is to sniff the languages in the 'name' table, which is the only 
place in a TTF which is language-specific. We need to categorise each font on 
disk C, J, or K, or neither. I'd be interested to know how Evince is doing this.


was (Author: jahewson):
We need a generic way to find substitute fonts, rather than hardcoding them. We 
only hardcoded the standard 14 fonts and use heuristics to find all other 
missing fonts.

There's a problem with trying to find substitutes using the Unicode characters, 
due to [Han unification|http://www.unicode.org/faq/han_cjk.html], where the 
same glyph might look very different in Japanese vs Korean vs Chinese but they 
share the same Unicode character.

TTF fonts don't explicitly label the languages which they contain. One idea I 
can suggest is to sniff the languages in the 'name' table, which is the only 
place in a TTF which is language-specific. We need to categorise each font on 
disk C, J, or K, or none. I'd be interested to know how Evince is doing this.

> [PATCH] Use cmap for Type0/TTF fallback
> ---------------------------------------
>
>                 Key: PDFBOX-2563
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2563
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.0
>            Reporter: simon steiner
>            Assignee: John Hewson
>             Fix For: 2.0.0
>
>         Attachments: VariousKFontsNotembeded218.PDF, ttfcmapfallback.patch
>
>
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
> VariousKFontsNotembeded218.PDF
> This patch addresses some of the issues in PDFBOX-2509.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to