Thierry Guérin created PDFBOX-5002:
--------------------------------------

             Summary: PDFTextStripper sometimes fuses two words on different 
lines
                 Key: PDFBOX-5002
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5002
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 2.0.21
            Reporter: Thierry Guérin
             Fix For: 2.0.22
         Attachments: small&Big.pdf

This happens when a text in a big font is followed by at least two lines of 
text in a smaller font: the last word of the first line is merged with the 
first word of the second line.

On the attached PDF, the extracted text is :
{noformat}
(...) some text awith smaller font (...){noformat}
instead of:

 
{noformat}
(...) some text with a smaller font (...)
{noformat}
I often encounter this kind of problem on invoices, where the company address 
(small text at the top right) is next to the company name & logo (big centered 
text at the top).

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to