[
https://issues.apache.org/jira/browse/PDFBOX-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manuel Aristaran updated PDFBOX-1755:
-------------------------------------
Description:
For some TextPositions in this particular PDF (after being processed with
PDFStreamEngine), the getWidthOfSpace method returns 0.
I've traced the bug to `processEncodedText` in `PDFStreamEngine`: when
`spaceWidthText` is converted to display units, `textMatrix.getValue(0,0)`
returns 0. Being a factor in the conversion expression, sets the result to 0.
float spaceWidthDisp = spaceWidthText * fontSizeText * horizontalScalingText *
textMatrix.getValue(0, 0)
* ctm.getValue(0, 0);
The conversion is correct if that factor is removed from the expression.
was:
For some TextPositions in this particular PDF (after being processed with
PDFStreamEngine), the getWidthOfSpace returns 0.
I've traced the bug to `processEncodedText` in `PDFStreamEngine`: when
`spaceWidthText` is converted to display units, `textMatrix.getValue(0,0)`
returns 0. Being a factor in the conversion expression, sets the result to 0.
float spaceWidthDisp = spaceWidthText * fontSizeText * horizontalScalingText *
textMatrix.getValue(0, 0)
* ctm.getValue(0, 0);
> Wrong widthOfSpace
> ------------------
>
> Key: PDFBOX-1755
> URL: https://issues.apache.org/jira/browse/PDFBOX-1755
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.3
> Environment: Java 7, JRuby
> Reporter: Manuel Aristaran
> Attachments: tabla_subsidios.pdf
>
>
> For some TextPositions in this particular PDF (after being processed with
> PDFStreamEngine), the getWidthOfSpace method returns 0.
> I've traced the bug to `processEncodedText` in `PDFStreamEngine`: when
> `spaceWidthText` is converted to display units, `textMatrix.getValue(0,0)`
> returns 0. Being a factor in the conversion expression, sets the result to 0.
> float spaceWidthDisp = spaceWidthText * fontSizeText * horizontalScalingText
> * textMatrix.getValue(0, 0)
> * ctm.getValue(0, 0);
> The conversion is correct if that factor is removed from the expression.
--
This message was sent by Atlassian JIRA
(v6.1#6144)