[ 
https://issues.apache.org/jira/browse/PDFBOX-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-317:
-------------------------------
    Component/s:     (was: Text extraction)

> PDFont.getStringWidth() returns incorrect values
> ------------------------------------------------
>
>                 Key: PDFBOX-317
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-317
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.6.0, 2.0.0
>             Fix For: 2.0.0
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1819754
> Originally submitted by brettpowley on 2007-10-24 23:59.
> For some text in some documents, getStringWidth() returns an incorrect value. 
>  In some cases it returns zero, which is clearly not correct.  In others, it 
> returns something that is too short.  An example of this follows:
> On the page, this text is part of text that reads "Cash flows from".  The 
> text in question is delivered to flushText in PDFTextStripper as multiple 
> TextPositions, and the ones below are those containing "w" and the next one 
> containing "s fr".
> The first one looks like this:
> TextPosition: "w"  
> getX=62.824474 
> getWidth=6.731968 
> getWordSpacing=0.000000 
> getWidthOfSpace=2.224000 
> getXScale=1.000000
> glyphFactor=999.999939, getXScale=1.000000, getStringWidth=814.000000, 
> calculatedFontWidth=0.814000 
> averageWidth=0.546769, 
> widthUsingSpaces=2.224000  
> widthUsingFont=0.546769
> Note that, according to getStringWidth(), the width of this text is 0.841 
> meaning it would end at 62.82 + 0.841 = 63.66.
> According to getWidth(), it ought to end at 62.82 + 6.73 = 69.55.
> When we look at the next chunk of text:
> TextPosition: "s fr" 
> getX=69.336563 getWidth=12.518410 
> we see that it does in fact start immediately after the previous one -- so 
> the width from getStringWidth() for the first one was incorrect.
> The font is a PDType1Font and its name appears to be 
> "YOTPKO+HelveticaNeue-Bold*1".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to