[ 
https://issues.apache.org/jira/browse/PDFBOX-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984469#comment-14984469
 ] 

Tilman Hausherr commented on PDFBOX-3078:
-----------------------------------------

In 1.8, the width is taken from AFM info and is 703 * 0,001 * 9 = 6,327

In 2.0, the width is taken from the actual font (Arial TT). The BBox height is 
1324, the half is taken (665) and this is multiplied with the actual font 
matrix which is 0.000488 i.e. half of 0,001.

>From the spec:
{code}
The glyph coordinate system is the space in which an individual character’s 
glyph is defined. All path coordinates and metrics shall be interpreted in 
glyph space. For all font types except Type 3, the units of glyph space are 
one-thousandth of a unit of text space; for a Type 3 font, the transformation 
from glyph space to text space shall be defined by a font matrix specified in 
an explicit FontMatrix entry in the font
{code}

A quick idea would be to replace in PDFTextStreamEngine
{code}
float height = font.getFontMatrix().transformPoint(0, glyphHeight).y;
{code}
with
{code}
        float height;
        if (font instanceof PDType3Font)
        {
            height = font.getFontMatrix().transformPoint(0, glyphHeight).y;
        }
        else
        {
            height = glyphHeight * 0.001f;
        }
{code}
but this brings a difference in the extraction of the file of PDFBOX-679. 
(Which is not in the official tests, but in mine).

> Text height coming in at half size, regression from 1.8
> -------------------------------------------------------
>
>                 Key: PDFBOX-3078
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3078
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Reporter: Joel Hirsh
>         Attachments: wrongsize.pdf
>
>
> Running 11/1 Dvlp build.
> PrintTextLocations on attached file has height of 2.9, which is incorrect.
> String[30.699997,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 
> width=5.0040016]1
> String[35.704,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 
> width=5.003998]2
> String[40.707996,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 
> width=5.003998]8
> String[45.711994,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 
> width=5.003998]6
> String[50.715992,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 
> width=5.003998]2
> String[63.79999,144.80005 fs=9.0 xscale=9.0 height=2.9236078 space=2.5020003 
> width=4.2210045]^
> Same file, Version 1.8 has height of 6.5, which is about right:
> String[30.699997,144.80005 fs=9.0 xscale=9.0 height=6.327 space=2.5020003 
> width=5.0040016]1
> String[35.704,144.80005 fs=9.0 xscale=9.0 height=6.327 space=2.5020003 
> width=5.0040016]2
> String[40.708,144.80005 fs=9.0 xscale=9.0 height=6.4980006 space=2.5020003 
> width=5.0040016]8
> String[45.712,144.80005 fs=9.0 xscale=9.0 height=6.4980006 space=2.5020003 
> width=5.0040016]6
> String[50.716003,144.80005 fs=9.0 xscale=9.0 height=6.4980006 space=2.5020003 
> width=5.0040016]2
> String[63.800007,144.80005 fs=9.0 xscale=9.0 height=3.8160002 space=2.5020003 
> width=4.220997]^



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to