Incorrect output when word spacing is achieved by matrix translation
--------------------------------------------------------------------

                 Key: PDFBOX-881
                 URL: https://issues.apache.org/jira/browse/PDFBOX-881
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.3.1, 1.4.0
            Reporter: David Rodríguez Alfayate


When extracting text in a PDF document in which word spacing is achieved by 
matrix translation, in versions 1.3.x and 1.4 the different words are being 
merged.

This situation doesn't happen in 1.2 branch. After investigating a bit, the 
error was introduced with a refactoring of the PDFStreamEngine class, and is 
related to textMatrixEnd computation. In 1.2 branch the characterSpacingWidth 
was added after computing the textMatrixEnd, but in 1.3 (and 1.4) this 
characterSpacingWidth is preadded to the textMatrixEnd, so the system is unable 
to detect a new word.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to