[ 
https://issues.apache.org/jira/browse/PDFBOX-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075899#comment-15075899
 ] 

Leo commented on PDFBOX-3175:
-----------------------------

If the 1/2 thing is not removed, but just moved to PDFTextStripper class, as I 
suggest, the people who use PDFTextStripper class will continue to get the 
exact same values of TextPosition as they get now. There won't be any break of 
output of PDFTextStreamEngine for 1.8 users because it did not exist at the 
time, and because I'm probably it's only user, since it is currently 
package-private in the trunk. But it would be a huge advantage for people, who 
start using the new PDFTextStreamEngine, if it is decided to be declared public 
officially (I opened a new issue with suggestion for that): they won't ever 
start using it with wrong values.
Moreover, accoring to the docs of TextPosition getHeight() both in 1.8 2.0 it 
should return "This will get the maximum height of all characters in this 
string." Obviously, currently it returns 1/2 of that height which is a break of 
API description.

> PDFTextStreamEngine probably miscalculates text height
> ------------------------------------------------------
>
>                 Key: PDFBOX-3175
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3175
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Leo
>         Attachments: MarketT_140815-1-marked-1-18.png, 
> MarketT_140815-1-marked-1.png, PDFBOX-3175-reduced.pdf, snapshot.png
>
>
> When parsing a PDF document, TextPosition is created with constant text 
> height, about 2 time smaller than character width, regardless of font size.
> The following workaround to calculate dyDisplay fixes the issue:
>         float verticalScaling = 1/1000f;
>         if (font instanceof PDType3Font) {
>             Matrix fontMatrix = font.getFontMatrix();
>             verticalScaling = fontMatrix.getValue(1, 1);
>         }
>         float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to