[ 
https://issues.apache.org/jira/browse/PDFBOX-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959330#comment-15959330
 ] 

Tilman Hausherr commented on PDFBOX-3745:
-----------------------------------------

I did some more research. The regression in text extraction yesterday was 
because in that file, the space is missing from the /Widths array and the new 
code returned 0, which then resulted in an average being taken as fallback. So 
changed the code to take the space width from the font itself, which is what 
was done before.

After this change there is a new difference in text extraction (cweb.pdf) but 
that one is a minor improvement, a space has appeared that wasn't there before, 
but Adobe has it too.

> Wrong character width
> ---------------------
>
>                 Key: PDFBOX-3745
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3745
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.5
>         Environment: Windows 10
>            Reporter: Ch. Schlatter
>         Attachments: p421.jpg, p42.pdf, PDFBOX-3745-reduced.pdf
>
>
> I tried to convert a PDF file to an image. But there is a error with the 
> character width computation. As you can see inside the blue box, there are 
> gaps after every umlaut mark (ä ,ö ,ü). Some characters in the font doesn't 
> contain any width information. The distance between the character is handled 
> by position adjustment. For example:
> [1., -278, ), -844, H, -722, ä, -556, u, -611, sliche P, -667, f, -333, lege] 
> TJ
> I guess there is an error in the font.getWidth() implementation. If I call 
> font.getWidth("ä") it returns 556. (Instead of 0, what would fit)
> I attached the PDF and the converted image.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to