[jira] [Comment Edited] (PDFBOX-3780) Heights of Characters

Tilman Hausherr (JIRA) Sun, 07 May 2017 10:56:48 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999951#comment-15999951
 ]


Tilman Hausherr edited comment on PDFBOX-3780 at 5/7/17 5:55 PM:
-----------------------------------------------------------------

The last commit improves getCapHeight() and getXHeight() for fonts that have 
the OS2 table with version 1. Get a snapshot here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.6-SNAPSHOT/

I haven't touched getAscent() and getDescent()... these are almost correct. The 
only problem is that Adobe defines ascent differently than the font itself, 
Adobe doesn't want the accent but the font wants them.

getHeight() is notoriously unreliable... here's the implementation for the 
subsetted font:
{code}
    @Override
    public float getHeight(int code) throws IOException
    {
        // todo: really we want the BBox, (for text extraction:)
        return (ttf.getHorizontalHeader().getAscender() + 
-ttf.getHorizontalHeader().getDescender())
                / ttf.getUnitsPerEm(); // todo: shouldn't this be the yMax/yMin?
    }
{code}
In text extraction, we know that the height is not always good, which is why we 
use capHeight when getHeight delivers weird results.


was (Author: tilman):
The last commit improves getCapHeight() and getXHeight() for fonts that have 
the OS2 table with version 1. Get a snapshot here
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.6-SNAPSHOT/
within a few minutes.

I haven't touched getAscent() and getDescent()... these are almost correct. The 
only problem is that Adobe defines ascent differently than the font itself, 
Adobe doesn't want the accent but the font wants them.

getHeight() is notoriously unreliable... here's the implementation for the 
subsetted font:
{code}
    @Override
    public float getHeight(int code) throws IOException
    {
        // todo: really we want the BBox, (for text extraction:)
        return (ttf.getHorizontalHeader().getAscender() + 
-ttf.getHorizontalHeader().getDescender())
                / ttf.getUnitsPerEm(); // todo: shouldn't this be the yMax/yMin?
    }
{code}
In text extraction, we know that the height is not always good, which is why we 
use capHeight when getHeight delivers weird results.

> Heights of Characters
> ---------------------
>
>                 Key: PDFBOX-3780
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3780
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.5
>            Reporter: Uwe Möser
>            Priority: Critical
>         Attachments: DejaVuSansCondensed-Bold.ttf, DejaVuSansCondensed.ttf, 
> PDFBoxHeightTest.java, PDFBoxHeightTest.pdf
>
>
> the functions 
> .getFontDescriptor().getCapHeight()
> .getFontDescriptor().getXHeight()
> .getFontDescriptor().getAscent()
> .getFontDescriptor().getDescent()
> getHeight(int code)
> do not work proper especially for embedded fonts, PDType0Font .
> Please see the attached  file PDFBoxHeightTest.pdf where the line is and 
> should be. The fonts were downloaded from 
> http://www.schriftarten-fonts.de/fonts/11283/dejavu_sans_condensed.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-3780) Heights of Characters

Reply via email to