[
https://issues.apache.org/jira/browse/PDFBOX-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074889#comment-15074889
]
Tilman Hausherr commented on PDFBOX-3175:
-----------------------------------------
The file from PDFBOX-1001 works fine with DrawPrintTextLocations (i.e. the
marks are as expected), but your own file does have problems so I won't close
this issue now :-) The red marks are too small, and the blue ones are
completely wrong, probably because of the rotation. I'll need to compare this
with 1.8 and/or find out why this is so small, and fix DrawPrintTextLocations,
and then we'll see... (Sometimes bad red marks come from incorrect data in the
PDF itself)
I'll be busy with other stuff today, so be patient. And if you do write your
own text extraction, compare it with the existing tests if you intend to
extract on different types of files.
> PDFTextStreamEngine probably miscalculates text height
> ------------------------------------------------------
>
> Key: PDFBOX-3175
> URL: https://issues.apache.org/jira/browse/PDFBOX-3175
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Leo
>
> When parsing a PDF document, TextPosition is created with constant text
> height, about 2 time smaller than character width, regardless of font size.
> The following workaround to calculate dyDisplay fixes the issue:
> float verticalScaling = 1/1000f;
> if (font instanceof PDType3Font) {
> Matrix fontMatrix = font.getFontMatrix();
> verticalScaling = fontMatrix.getValue(1, 1);
> }
> float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]