[
https://issues.apache.org/jira/browse/PDFBOX-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962559#comment-14962559
]
John Hewson commented on PDFBOX-3028:
-------------------------------------
It's definitely odd. Ignoring any char-spacing/word-spacing a space is almost
always going to be between 0.2 to 0.3 em. For justified paragraphs, lets say
0.1 to 0.4 em. A better approach might be to estimate the font's size in pt and
then derive the space size from that as a percentage.
> Text extraction broken for jbl example
> --------------------------------------
>
> Key: PDFBOX-3028
> URL: https://issues.apache.org/jira/browse/PDFBOX-3028
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Ben McCann
> Attachments: jbl-example-com.pdf, spacing-test.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]