[ 
https://issues.apache.org/jira/browse/PDFBOX-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075881#comment-15075881
 ] 

Tilman Hausherr commented on PDFBOX-3175:
-----------------------------------------

I get this result with both PrintTextLocations examples on the 
PDFBOX-3175-reduced.pdf file:
{code}
String[400.0,200.0 fs=20.0 xscale=20.0 height=11.2 space=5.5200005 
width=16.200012]M
String[416.2,200.0 fs=20.0 xscale=20.0 height=11.2 space=5.5200005 
width=5.1600037]I
String[421.36002,200.0 fs=20.0 xscale=20.0 height=11.2 space=5.5200005 
width=14.480011]C
String[435.84003,200.0 fs=20.0 xscale=20.0 height=11.2 space=5.5200005 
width=13.440002]E
String[449.28003,200.0 fs=20.0 xscale=20.0 height=11.2 space=5.5200005 
width=12.76001]X
{code}
You wrote that
{quote}
Removing the division by 2 makes call to TextPosition almost identical to 1.8 
style behavior
{quote}
What value is different for you? Note that this
{code}
float glyphHeight = bbox.getHeight() / 2;
{code}
is the intended behavior at this time. Yes it looks weird, but it is a good 
value to help identify lines that go together. If you don't like it, use the 
solution for the blue marks in DrawPrintTextLocations, that uses the bounding 
box only with no heuristics and no adjustment of "wild" values.

> PDFTextStreamEngine probably miscalculates text height
> ------------------------------------------------------
>
>                 Key: PDFBOX-3175
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3175
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Leo
>         Attachments: MarketT_140815-1-marked-1-18.png, 
> MarketT_140815-1-marked-1.png, PDFBOX-3175-reduced.pdf, snapshot.png
>
>
> When parsing a PDF document, TextPosition is created with constant text 
> height, about 2 time smaller than character width, regardless of font size.
> The following workaround to calculate dyDisplay fixes the issue:
>         float verticalScaling = 1/1000f;
>         if (font instanceof PDType3Font) {
>             Matrix fontMatrix = font.getFontMatrix();
>             verticalScaling = fontMatrix.getValue(1, 1);
>         }
>         float dyDisplay = bbox.getHeight() * fontSize * verticalScaling;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to