[
https://issues.apache.org/jira/browse/PDFBOX-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999951#comment-15999951
]
Tilman Hausherr edited comment on PDFBOX-3780 at 5/7/17 5:55 PM:
-----------------------------------------------------------------
The last commit improves getCapHeight() and getXHeight() for fonts that have
the OS2 table with version 1. Get a snapshot here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.6-SNAPSHOT/
I haven't touched getAscent() and getDescent()... these are almost correct. The
only problem is that Adobe defines ascent differently than the font itself,
Adobe doesn't want the accent but the font wants them.
getHeight() is notoriously unreliable... here's the implementation for the
subsetted font:
{code}
@Override
public float getHeight(int code) throws IOException
{
// todo: really we want the BBox, (for text extraction:)
return (ttf.getHorizontalHeader().getAscender() +
-ttf.getHorizontalHeader().getDescender())
/ ttf.getUnitsPerEm(); // todo: shouldn't this be the yMax/yMin?
}
{code}
In text extraction, we know that the height is not always good, which is why we
use capHeight when getHeight delivers weird results.
was (Author: tilman):
The last commit improves getCapHeight() and getXHeight() for fonts that have
the OS2 table with version 1. Get a snapshot here
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.6-SNAPSHOT/
within a few minutes.
I haven't touched getAscent() and getDescent()... these are almost correct. The
only problem is that Adobe defines ascent differently than the font itself,
Adobe doesn't want the accent but the font wants them.
getHeight() is notoriously unreliable... here's the implementation for the
subsetted font:
{code}
@Override
public float getHeight(int code) throws IOException
{
// todo: really we want the BBox, (for text extraction:)
return (ttf.getHorizontalHeader().getAscender() +
-ttf.getHorizontalHeader().getDescender())
/ ttf.getUnitsPerEm(); // todo: shouldn't this be the yMax/yMin?
}
{code}
In text extraction, we know that the height is not always good, which is why we
use capHeight when getHeight delivers weird results.
> Heights of Characters
> ---------------------
>
> Key: PDFBOX-3780
> URL: https://issues.apache.org/jira/browse/PDFBOX-3780
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.5
> Reporter: Uwe Möser
> Priority: Critical
> Attachments: DejaVuSansCondensed-Bold.ttf, DejaVuSansCondensed.ttf,
> PDFBoxHeightTest.java, PDFBoxHeightTest.pdf
>
>
> the functions
> .getFontDescriptor().getCapHeight()
> .getFontDescriptor().getXHeight()
> .getFontDescriptor().getAscent()
> .getFontDescriptor().getDescent()
> getHeight(int code)
> do not work proper especially for embedded fonts, PDType0Font .
> Please see the attached file PDFBoxHeightTest.pdf where the line is and
> should be. The fonts were downloaded from
> http://www.schriftarten-fonts.de/fonts/11283/dejavu_sans_condensed.html
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]