[ 
https://issues.apache.org/jira/browse/PDFBOX-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075907#comment-15075907
 ] 

Tilman Hausherr commented on PDFBOX-3175:
-----------------------------------------

{quote}
If the 1/2 thing is not removed, but just moved to PDFTextStripper class, as I 
suggest, the people who use PDFTextStripper class will continue to get the 
exact same values of TextPosition as they get now. 
{quote}
But the people who use the TextPosition would get the double values, see in 
PrintTextLocations.

{quote}
Moreover, according to the docs of TextPosition getHeight() both in 1.8 2.0 it 
should return "This will get the maximum height of all characters in this 
string." Obviously, currently it returns 1/2 of that height which is a break of 
API description.
{quote}
Ouch. Yes, that API doc is incorrect, and I don't have a good replacement text 
idea. Even if we'd return the double, it would still be incorrect, because the 
bounding box height is often bigger than the actual glyphs height. So at this 
time, we try to do everything as in 1.8.

After releasing 2.0 there should probably be a refactoring that
- returns the actual height by getting it from the glyph paths if somebody 
wants it
- returns the vodoo values if they are needed


> PDFTextStreamEngine probably miscalculates text height
> ------------------------------------------------------
>
>                 Key: PDFBOX-3175
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3175
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Leo
>         Attachments: MarketT_140815-1-marked-1-18.png, 
> MarketT_140815-1-marked-1.png, PDFBOX-3175-reduced.pdf, snapshot.png
>
>
> When parsing a PDF document, TextPosition is created with constant text 
> height, about 2 time smaller than character width, regardless of font size.
> The following workaround to calculate dyDisplay fixes the issue:
>         float verticalScaling = 1/1000f;
>         if (font instanceof PDType3Font) {
>             Matrix fontMatrix = font.getFontMatrix();
>             verticalScaling = fontMatrix.getValue(1, 1);
>         }
>         float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to