[
https://issues.apache.org/jira/browse/PDFBOX-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075899#comment-15075899
]
Leo commented on PDFBOX-3175:
-----------------------------
If the 1/2 thing is not removed, but just moved to PDFTextStripper class, as I
suggest, the people who use PDFTextStripper class will continue to get the
exact same values of TextPosition as they get now. There won't be any break of
output of PDFTextStreamEngine for 1.8 users because it did not exist at the
time, and because I'm probably it's only user, since it is currently
package-private in the trunk. But it would be a huge advantage for people, who
start using the new PDFTextStreamEngine, if it is decided to be declared public
officially (I opened a new issue with suggestion for that): they won't ever
start using it with wrong values.
Moreover, accoring to the docs of TextPosition getHeight() both in 1.8 2.0 it
should return "This will get the maximum height of all characters in this
string." Obviously, currently it returns 1/2 of that height which is a break of
API description.
> PDFTextStreamEngine probably miscalculates text height
> ------------------------------------------------------
>
> Key: PDFBOX-3175
> URL: https://issues.apache.org/jira/browse/PDFBOX-3175
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Leo
> Attachments: MarketT_140815-1-marked-1-18.png,
> MarketT_140815-1-marked-1.png, PDFBOX-3175-reduced.pdf, snapshot.png
>
>
> When parsing a PDF document, TextPosition is created with constant text
> height, about 2 time smaller than character width, regardless of font size.
> The following workaround to calculate dyDisplay fixes the issue:
> float verticalScaling = 1/1000f;
> if (font instanceof PDType3Font) {
> Matrix fontMatrix = font.getFontMatrix();
> verticalScaling = fontMatrix.getValue(1, 1);
> }
> float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]