[ 
https://issues.apache.org/jira/browse/PDFBOX-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220788#comment-17220788
 ] 

Thierry Guérin commented on PDFBOX-5002:
----------------------------------------

Created [https://github.com/apache/pdfbox/pull/89] that fixes the problem

> PDFTextStripper sometimes fuses two words on different lines
> ------------------------------------------------------------
>
>                 Key: PDFBOX-5002
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5002
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.21
>            Reporter: Thierry Guérin
>            Priority: Minor
>             Fix For: 2.0.22
>
>         Attachments: small&Big.pdf
>
>
> This happens when a text in a big font is followed by at least two lines of 
> text in a smaller font: the last word of the first line is merged with the 
> first word of the second line.
> On the attached PDF, the extracted text is :
> {noformat}
> (...) some text awith smaller font (...){noformat}
> instead of:
>  
> {noformat}
> (...) some text with a smaller font (...)
> {noformat}
> I often encounter this kind of problem on invoices, where the company address 
> (small text at the top right) is next to the company name & logo (big 
> centered text at the top).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to