[jira] [Comment Edited] (PDFBOX-3745) Wrong character width

Tilman Hausherr (JIRA) Wed, 05 Apr 2017 12:27:53 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957378#comment-15957378
 ]


Tilman Hausherr edited comment on PDFBOX-3745 at 4/5/17 7:27 PM:
-----------------------------------------------------------------

The font has a /Width array that is too short. /Lastchar is 119, but the "ä" is 
higher. According to the PDF specification, in that case the value of 
/MissingWidth is to be used. That one doesn't exist. Its default value is 0, 
which coincidentally, is what you mentioned and what works for this file (I 
tested that in debugging).

So I've tested what happens with our many test files if I change the code in 
{{PDFont.getWidth()}}...

There is a difference in the rendering of PDFBOX-3125, but that one is also 
rendered badly by Adobe Reader. We're rendering it differently bad, so I don't 
have to investigate further. But there's a difference in text extraction for 
PDFBOX-3061 and that one is more annoying because it is worse than with Adobe 
Reader.


was (Author: tilman):
The font has a /Width array that is too short. /Lastchar is 119, but the "ä" is 
higher. According to the PDF specification, in that case the value of 
/MissingWidth is to be used. That one doesn't exist. Its default value is 0, 
which coincidentally, is what you mentioned and what works for this file (I 
tested that in debugging).

So I've tested what happens with our many test files if I change the code...

There is a difference in the rendering of PDFBOX-3125, but that one is also 
rendered badly by Adobe Reader. We're rendering it differently bad, so I don't 
have to investigate further. But there's a difference in text extraction for 
PDFBOX-3061 and that one is more annoying because it is worse than with Adobe 
Reader.

> Wrong character width
> ---------------------
>
>                 Key: PDFBOX-3745
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3745
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.5
>         Environment: Windows 10
>            Reporter: Ch. Schlatter
>         Attachments: p421.jpg, p42.pdf, PDFBOX-3745-reduced.pdf
>
>
> I tried to convert a PDF file to an image. But there is a error with the 
> character width computation. As you can see inside the blue box, there are 
> gaps after every umlaut mark (ä ,ö ,ü). Some characters in the font doesn't 
> contain any width information. The distance between the character is handled 
> by position adjustment. For example:
> [1., -278, ), -844, H, -722, ä, -556, u, -611, sliche P, -667, f, -333, lege] 
> TJ
> I guess there is an error in the font.getWidth() implementation. If I call 
> font.getWidth("ä") it returns 556. (Instead of 0, what would fit)
> I attached the PDF and the converted image.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-3745) Wrong character width

Reply via email to