[ 
https://issues.apache.org/jira/browse/PDFBOX-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962518#comment-14962518
 ] 

Ben McCann commented on PDFBOX-3028:
------------------------------------

If you run PrintTextLocations on 
pdfbox/src/test/resrouces/input/sample_fonts_solidconvertor.pdf then it shows 
the width of a space as being 10x the width of a character. That's crazy. Does 
anyone know if it's because that's a really screwy pdf or if it's because 
there's some bug in the way we calculate the width of a space?

{code}String[92.585,79.52399 fs=9.3624 xscale=9.268776 height=5.302922 
space=51.546127 width=5.561264]C{code}

Another reason it seems really wrong is that there are actual space characters 
being printed. So sometimes a space is an actual unicode character for a space 
and sometimes it's just two characters not being near each other? In any case, 
here we have a space character being printed by PrintTextLocations with it's 
width being {{5.561264}} and simultaneously PrintTextLocations is saying that 
the width of a space in this font is {{51.546127}}, which clearly it isn't

{code}String[137.29758,79.52399 fs=9.3624 xscale=9.268776 height=5.302922 
space=51.546127 width=5.561264] {code}

> Text extraction broken for jbl example
> --------------------------------------
>
>                 Key: PDFBOX-3028
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3028
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Ben McCann
>         Attachments: jbl-example-com.pdf, spacing-test.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to