[
https://issues.apache.org/jira/browse/PDFBOX-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075907#comment-15075907
]
Tilman Hausherr commented on PDFBOX-3175:
-----------------------------------------
{quote}
If the 1/2 thing is not removed, but just moved to PDFTextStripper class, as I
suggest, the people who use PDFTextStripper class will continue to get the
exact same values of TextPosition as they get now.
{quote}
But the people who use the TextPosition would get the double values, see in
PrintTextLocations.
{quote}
Moreover, according to the docs of TextPosition getHeight() both in 1.8 2.0 it
should return "This will get the maximum height of all characters in this
string." Obviously, currently it returns 1/2 of that height which is a break of
API description.
{quote}
Ouch. Yes, that API doc is incorrect, and I don't have a good replacement text
idea. Even if we'd return the double, it would still be incorrect, because the
bounding box height is often bigger than the actual glyphs height. So at this
time, we try to do everything as in 1.8.
After releasing 2.0 there should probably be a refactoring that
- returns the actual height by getting it from the glyph paths if somebody
wants it
- returns the vodoo values if they are needed
> PDFTextStreamEngine probably miscalculates text height
> ------------------------------------------------------
>
> Key: PDFBOX-3175
> URL: https://issues.apache.org/jira/browse/PDFBOX-3175
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Leo
> Attachments: MarketT_140815-1-marked-1-18.png,
> MarketT_140815-1-marked-1.png, PDFBOX-3175-reduced.pdf, snapshot.png
>
>
> When parsing a PDF document, TextPosition is created with constant text
> height, about 2 time smaller than character width, regardless of font size.
> The following workaround to calculate dyDisplay fixes the issue:
> float verticalScaling = 1/1000f;
> if (font instanceof PDType3Font) {
> Matrix fontMatrix = font.getFontMatrix();
> verticalScaling = fontMatrix.getValue(1, 1);
> }
> float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]