[ 
https://issues.apache.org/jira/browse/PDFBOX-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967458#comment-14967458
 ] 

ASF subversion and git services commented on PDFBOX-3042:
---------------------------------------------------------

Commit 1709883 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1709883 ]

PDFBOX-3042: don't multiply with fontSize, as this has already been done before

> Bad space calculation in text extraction
> ----------------------------------------
>
>                 Key: PDFBOX-3042
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3042
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>              Labels: regression
>             Fix For: 2.0.0
>
>         Attachments: PDFBOX-3042-003177-p2-reduced.pdf, 
> PDFBOX-3042-003177-p2.pdf
>
>
> Some debug output from attached reduced file:
> 2.0:
> {code}
> spaceWidthText: 0.25
> fontSizeText: 12.0
> horizontalScalingText: 1.0
> textRenderingMatrix.getScalingFactorX(): 12.0, textRenderingMatrix: 
> [12.0,0.0,0.0,12.0,100.0,700.0]
> ctm.getScalingFactorX(): 1.0
> spaceWidthDisplay: 36.0
> String[100.0,91.0 fs=12.0 xscale=12.0 height=7.8808603 space=36.0 
> width=8.003998]B
> {code}
> 1.8:
> {code}
> spaceWidthText: 0.25
> fontSizeText: 12.0
> horizontalScalingText: 1.0
> textMatrix.getXScale(): 1.0, textMatrix: 
> [[1.0,0.0,0.0][0.0,1.0,0.0][100.0,700.0,1.0]]
> ctm.getXScale(): 1.0
> spaceWidthDisp: 3.0
> String[100.0,91.0 fs=12.0 xscale=12.0 height=7.884 space=3.0 width=8.003998]B
> {code}
> stream content is
> {code}
> 1 0 0 1 0 0 cm
> n
> BT
> /F12 12 Tf
> 1 0 0 1 100 700 Tm
> (B) Tj
> ET
> {code}
> The cause is somewhat similar to PDFBOX-3019, a factor is used twice. In 2.0, 
> the fontSize is already calculated into the "parameters" Matrix object, which 
> is used to calculate "textRenderingMatrix". In 1.8, textStateParameters is 
> set similarly, but not used in the calculation of spaceWidthDisp.
> The problem was discovered because of different text extractions. 
> The problem did not appear in PDFBOX-3019 because fontSizeText was 1. 
> The fix also solves the problem I mentioned at the end of PDFBOX-3038.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to